after exploring the dataset I found that there are some scenes like scene-0043 have only 39 annotated samples and some scenes like scene-0042 have 41 annotated samples, while majority of the scenes come with 40 annotated samples.

Can you tell me if I am right in saying this or am I doing something wrong and that each scene has in fact exactly 40 annotated samples?

That’s correct. Since the sensor frequency fluctuates, we sometimes have more or less keyframes in 20s.

