0 lidar+radar points

Shubham_Gupta · October 4, 2021, 9:34pm

Hi,
I was looking at the dataset and found that there are several annotations for objects with 0 lidar + radar points. How do you get the depth estimate for such objects?

Also, in the evaluation, these objects are filtered in the devkit. What’s the reason for that?

Thanks,
Shubham

Whye_Kit_Fong · October 5, 2021, 1:53am

We remove (GT) boxes without lidar or radar points in them as we cannot guarantee that they are actually visible in the frame (pls see https://github.com/nutonomy/nuscenes-devkit/tree/master/python-sdk/nuscenes/eval/detection#preprocessing for more details)

Shubham_Gupta · October 6, 2021, 12:54pm

I see. Is it possible to share how exactly you obtain the GT annotations for such boxes? I’m guessing it using the previous frames…

holger-motional · October 13, 2021, 11:30am

Hi. The tooling we use can show all cameras, lidars and radars for the 20s scene. Objects are annotated with interpolation. That means that if we see an object at 1s and at 3s, the frames in between are linearly interpolated, even if the object is occluded. For evaluation purposes we later remove the boxes without lidar/radar points, as there was no chance for our detector to detect these. Of course there can be all kinds of exceptions, e.g. often you see an object through the window of a car, but may not get lidar returns from it.

Shubham_Gupta · October 13, 2021, 1:33pm

Thanks a lot for the info. I am thinking although it seems fair, it might bias the detection metric in favour of lidar compared to camera.

holger-motional · October 13, 2021, 1:52pm

Yes, that is a fair point. But I would say it was never a “fair” to begin with, since vision based methods primarily have uncertainty along the depth dimension, which makes it very hard to achieve a good mAP at a large distance.

Shubham_Gupta · October 13, 2021, 2:30pm

I think at a large distance lidar doesn’t even get the points so it’s even harder for it to detect the objects. RGB should do better although the depth estimate would not be accurate. I think defining a fair metric for different modalities is quite complex especially when evaluating at a large distance.