How to load trainval data

Hi all,

When loading the mini dataset I use the following code:

current_dir = os.getcwd()

dataroot = os.path.join(current_dir,“data”,“sets”,“nuscenes”,“mini”)

nusc = NuScenes(version=‘v1.0-mini’, dataroot=dataroot, verbose=True)
and my mini folder looks like this:

I have downloaded two parts of the trainval set, but I was a bit confused as to how to combine the parts and access the entire set. After extracting everything from the .tgz files my directory structure looks like

As you can see there are two separate folders containing trainval01 and trainval02 and I was unsure how to deal with this. Do I have to somehow consolidate all the data into one folder? Alternatively, do I simply set the dataroot to be the directory containing the trainval01 and trainval02 data? Or am I supposed to separately load each part of the data with multiple instantiations of the Nuscenes() class? Any clarification on how to go about this would be extremely helpful.

Hi @rsvarma,
Yes, the data from the 10 compressed folders need to be consolidated. Essentially, your folder structure should end up looking like this:

└── /data/sets/nuscenes
    ├── maps
    ├── samples
    ├── sweeps
    └── v1.0-{mini, test, trainval}
        ├── Usual files (e.g. attribute.json, calibrated_sensor.json etc.)
        └── category.json  <- contains the categories of the labels

And then all you have to do when instantiating the NuScenes class (assuming you would like to use the trainval split) is:

nusc = NuScenes(version='v1.0-trainval', dataroot='/data/sets/nuscenes', verbose=True)

To add to that, “A TGZ file is a TAR Archive file that has been compressed using Gnu Zip”. You will need to unzip/tar this file to get to the structure that @Whye_Kit_Fong described.