Stereo event- and frame-based benchmark dataset for scene understanding
dataset
posted on 15.12.2021, 19:55 authored by James TurnerJames Turner, Jens Pedersen, Jörg Conradt, Thomas NowotnyThomas NowotnyThis is a proof-of-concept labelled dataset for training semantic
segmentation and pose estimation vision systems with neuromorphic
event-based vision
Recorded data directories take the form ./data/[prop]/[sample #]. Prop meshes are in ./props/[prop] in STL format.
Each data directory contains three HDF5 files containing segmented RGB
frames, segmented DVS events and prop pose information. For each 30
second recording, both raw and undistorted (after lens distortion
correction) data is saved. RGB frame and visual event labels are
integers, with 0 meaning `background', -1 meaning `ambiguous' (multiple
props overlap) and any i > 0 denoting class i. The fields of the RGB frame HDF5 file frame.h5 are as follows, where i in {0, ..., n - 1} and n = 2 is the number of cameras:
timestamp_i: frame timestamp (camera i)
image_raw_i: frame before distortion correction (camera i)
image_undistorted_i: frame after distortion correction (camera i)
label_i: pixelwise frame label (camera i)
The fields of the DVS event HDF5 file event.h5 are as follows:
timestamp_i: event timestamp (camera i)
polarity_i: event polarity (camera i)
xy_raw_i: event x and y before distortion correction (camera i)
xy_undistorted_i: event x and y after distortion correction (camera i)
label_i: event label (camera i)
Prop
pose information, including translation and rotation (respectively
millimetres and degrees) are stored as floating-point numbers for both
global 3D tracking coordinates and camera-centric coordinates.
Floating-point not-a-number indicates bad or missing data. The fields of
the prop pose HDF5 file pose.h5 are as follows:
timestamp: pose timestamp
extrapolated[p]: true when the pose of prop p was extrapolated
rotation[p]: prop p rotation in global coordinates
camera_rotation_i[p]: prop p rotation relative to camera i
translation[p][m]: prop p marker m translation in global coordinates
camera_translation_i[p][m]: prop p marker m translation relative to camera i
The data is in HDF5 format which is accessible with a number of tools eg PyTables in Python.