Information on Reference Frames

I have been working with LiDAR pointclouds and their associated bounding boxes. I was a bit confused about what the various reference frames are and what objects are part of them.

A bit more specifically I am confused by this block of code
‘’’
# Points live in the point sensor frame. So they need to be transformed via global to the image plane.
# First step: transform the point-cloud to the ego vehicle frame for the timestamp of the sweep.
cs_record = self.nusc.get(‘calibrated_sensor’, pointsensor[‘calibrated_sensor_token’])
pc.rotate(Quaternion(cs_record[‘rotation’]).rotation_matrix)
pc.translate(np.array(cs_record[‘translation’]))

    # Second step: transform to the global frame.
    poserecord = self.nusc.get('ego_pose', pointsensor['ego_pose_token'])
    pc.rotate(Quaternion(poserecord['rotation']).rotation_matrix)
    pc.translate(np.array(poserecord['translation']))

    # Third step: transform into the ego vehicle frame for the timestamp of the image.
    poserecord = self.nusc.get('ego_pose', cam['ego_pose_token'])
    pc.translate(-np.array(poserecord['translation']))
    pc.rotate(Quaternion(poserecord['rotation']).rotation_matrix.T)

    # Fourth step: transform into the camera.
    cs_record = self.nusc.get('calibrated_sensor', cam['calibrated_sensor_token'])
    pc.translate(-np.array(cs_record['translation']))
    pc.rotate(Quaternion(cs_record['rotation']).rotation_matrix.T)

‘’’
Here it says in the first step it takes points from the pointcloud, which are in the point sensor frame and transforms it into the ego vehicle frame for the timestamp of the sweep, and then transforms it into the global frame. However, I am confused because in a separate function that I use to figure out which points are in a box

‘’’
def points_in_box(box: ‘Box’, points: np.ndarray, wlh_factor: float = 1.0):

"""

Checks whether points are inside the box.

Picks one corner as reference (p1) and computes the vector to a target point (v).

Then for each of the 3 axes, project v onto the axis and compare the length.

Inspired by: https://math.stackexchange.com/a/1552579

:param box: <Box>.

:param points: <np.float: 3, n>.

:param wlh_factor: Inflates or deflates the box.

:return: <np.bool: n, >.

"""

corners = box.corners(wlh_factor=wlh_factor)

p1 = corners[:, 0]

p_x = corners[:, 4]

p_y = corners[:, 1]

p_z = corners[:, 3]

i = p_x - p1

j = p_y - p1

k = p_z - p1

v = points - p1.reshape((-1, 1))

iv = np.dot(i, v)

jv = np.dot(j, v)

kv = np.dot(k, v)

mask_x = np.logical_and(0 <= iv, iv <= np.dot(i, i))

mask_y = np.logical_and(0 <= jv, jv <= np.dot(j, j))

mask_z = np.logical_and(0 <= kv, kv <= np.dot(k, k))

mask = np.logical_and(np.logical_and(mask_x, mask_y), mask_z)

return mask

‘’’
I do not do any transformations on either the pointclouds nor the box. Additionally, to find the distance between the box and the ego_vehicle, I saw in another post that I could just take the euclidean distance between the center of the box and the translation of the ego_pose as they are both in global coordinates. This confuses me because I was under the impression that the pointcloud would be in a different reference frame than the box.

As such, I was hoping to get some details on what exactly the various reference frames (point sensor frame, ego frame for timestamp of sweep, global frame) are referring to, how they are related, and which objects are in particular reference frames in context of the code blocks above.

Hi,
we have the following coordinate frames:

  • Sensor: Frame that the sensor captures the data in. Axes vary by type of sensor, see https://www.nuscenes.org/public/images/data.png.
  • Ego vehicle: The position of the rear axle projected onto the flat ground. The axes follow the same convention as the lidar (but are 1-2m behind it and on the ground). Note that every sensor reading has it’s own timestamp and therfore even for one keyframe the timestamp and ego vehicle position may vary slightly.
  • Global: The map coordinates in m from the bottom left corner of the map.

For the conventions see https://github.com/nutonomy/nuscenes-devkit/blob/master/schema.md. ego_pose and sample_annotation are both relative to global coordinates.

For the code snippets:

  • First block: Due to the different ego poses at different timestamps you need to map from one ego pose to global to another ego pose and then into the camera.
  • points_in_box: This assumes that box and points are in the same frame, e.g. the lidar frame. Hence no transformation inside that function.

It is interesting to discover that neither the paper nor the github repo actually define the coordinate system of the ego vehicle frame. And I suspect that what you say here is not true. The ego vehicle frame is in the IMU coordinate, not in the LiDAR coordinate. By looking at the front camera pose wrt the ego vehicle frame, I observe a pattern of the rotation matrix, it is close to

[0, 0, 1]
[-1, 0, 0]
[0, -1, 0]

Rotating the unit vector ex, ey, ez with this matrix, you will get -ey, -ez, ex respectively. By doing this rotation, we are going from the camera frame to the ego frame, and this result aligns with the IMU coordinate, not the LiDAR coordinate.

We define the coordinate systems in https://www.nuscenes.org/public/images/data.png and you can find the exact numbers in the dataset. Compared to other datasets, everything in nuScenes is setup such that you don’t need to know the coordinate frames. You just need to use the transformations from the files. This makes it more beginner-friendly.

@holger-nutonomy, thanks for the response. I am still a bit confused as to why I am getting the correct results from my points_in_box function. I understand that there are no transformations inside the function since it is assuming that the box and points are in the same frame. However, when I am using the function I do not apply any transformations to the box or the points before passing them in, yet points_in_box still returns the correct number of points for the annotation (as verified by the annotations num_points) property. This confuses me, as I would expect the boxes to be in the global coordinate frame and the lidar points to be in the sensor data frame, in which case the points_in_box function shouldn’t be able to correctly identify the points. I have attached a simple example of the code I am running as well as my output

from nuscenes.nuscenes import NuScenes

from nuscenes.utils.data_classes import LidarPointCloud, RadarPointCloud, Box

from pyquaternion import Quaternion

import matplotlib as plt

import numpy as np

import os

import pdb

current_dir = os.getcwd()

dataroot = os.path.join(current_dir,"data","sets","nuscenes")

nusc = NuScenes(version='v1.0-mini', dataroot=dataroot, verbose=True)


def points_in_box(box: 'Box', points: np.ndarray, wlh_factor: float = 1.0):
    """
    Checks whether points are inside the box.
    Picks one corner as reference (p1) and computes the vector to a target point (v).
    Then for each of the 3 axes, project v onto the axis and compare the length.
    Inspired by: https://math.stackexchange.com/a/1552579
    :param box: <Box>.
    :param points: <np.float: 3, n>.
    :param wlh_factor: Inflates or deflates the box.
    :return: <np.bool: n, >.
    """
    corners = box.corners(wlh_factor=wlh_factor)

    p1 = corners[:, 0]
    p_x = corners[:, 4]
    p_y = corners[:, 1]
    p_z = corners[:, 3]

    i = p_x - p1
    j = p_y - p1
    k = p_z - p1

    v = points - p1.reshape((-1, 1))

    iv = np.dot(i, v)
    jv = np.dot(j, v)
    kv = np.dot(k, v)

    mask_x = np.logical_and(0 <= iv, iv <= np.dot(i, i))
    mask_y = np.logical_and(0 <= jv, jv <= np.dot(j, j))
    mask_z = np.logical_and(0 <= kv, kv <= np.dot(k, k))
    mask = np.logical_and(np.logical_and(mask_x, mask_y), mask_z)

    return mask



my_scene = nusc.scene[0]

#print(my_scene)

first_sample_token = my_scene['first_sample_token']

sample = nusc.get('sample', first_sample_token)

ann_token = sample['anns'][18]

ann = nusc.get('sample_annotation',ann_token)

print(ann)

lidar_token = sample['data']['LIDAR_TOP']

data_path,boxes,camera_intrinsic = nusc.get_sample_data(lidar_token,selected_anntokens=[ann_token])

lidar_rec = nusc.get('sample_data',lidar_token)

pointcloud = LidarPointCloud.from_file(data_path)

mask = points_in_box(boxes[0],pointcloud.points[:3])

ann_points = pointcloud.points[:3,mask]
print(ann_points.shape)

I think what you are missing is that get_sample_data transforms the boxes into the lidar:

        Returns the data path as well as all annotations related to that sample_data.
        Note that the boxes are transformed into the current sensor's coordinate frame.

So both your boxes and lidar points are in the lidar frame, which is valid.

Ah, thank you that makes sense