Clarification on timestamps and capture frequency of sweeps

So I want to use the sweeps of CAM_FRONT provided by the scenes dataset for unsupervised learning. However, it is unclear to me what the actual capture frequency is. On the website it says it is 12hz, however if that’s the case, the timestamps don’t seem to make sense.
I read somewhere the timestamps are in microseconds, but when analyzing the timestamps, you can find that the differences between the frames are usually 50, 100, 150, 200 or 250 ms. 12hz would correspond to about 80ms per interval between frames, which never seems to occur.

So what exactly is going on? Are the frames shot at even intervals at 12hz and are the timestamps not exact? Do bigger intervals mean some frames have been skipped?
Or is it the case that the frames are shot at differing intervals and the timestamps are exact and correct?

Hi, your observation is correct.
Under we write:

In order to achieve good cross-modality data alignment between the LIDAR and the cameras, the exposure of a camera is triggered when the top LIDAR sweeps across the center of the camera’s FOV. The timestamp of the image is the exposure trigger time; and the timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved. Given that the camera’s exposure time is nearly instantaneous, this method generally yields good data alignment. Note that the cameras run at 12Hz while the LIDAR runs at 20Hz. The 12 camera exposures are spread as evenly as possible across the 20 LIDAR scans, so not all LIDAR scans have a corresponding camera frame. Reducing the frame rate of the cameras to 12Hz helps to reduce the compute, bandwidth and storage requirement of the perception system.

So it is 50ms or a multiple of 50ms to end up with about 12Hz in total. Some non-keyframes may also have been dropped due to high system load, which could explain the very large numbers (e.g. 250ms). If keyframes are dropped, we discarded the entire scene.