Most publicly available embodied datasets trade off realism, scale, or precise hand-state measurements. SynData aims to reduce that tradeoff by combining high-accuracy exoskeleton-glove capture with large-scale egocentric video and bare-hand recordings, producing clip-aligned multimodal sequences intended for real-world manipulation and imitation learning research.
What Sets It Apart
- High-precision hand/arm signals and tactile channels: captured using PsiBot's exoskeleton glove, providing millimeter-level positioning and full DoF joint states that are rare in large public datasets — useful for learning precise manipulation primitives and inverse kinematics.
- Large clip-level scale and diversity: subsets include an egocentric collection (313,674 clips) and multiple glove-based subsets (e.g., glove-origin 95,383 clips), enabling both vision-centric and action-centric training at scale.
- Hybrid data strategy: includes both exoskeleton-derived measurements and bare-hand clips plus background-replaced variants, which helps models learn from high-fidelity structured signals while retaining natural interaction variability.
- Clip-oriented Zarr storage: data are distributed as compressed .zarr.tar volumes with manifests and parquet indices so researchers can stream or selectively download subsets rather than pulling monolithic archives.
Who It's For and Tradeoffs
Great fit if you need high-resolution hand/arm kinematics aligned with first-person vision for tasks such as imitation learning, visuomotor policy training, manipulation behavior cloning, or multimodal representation learning. It’s also useful for benchmarking embodied models that require synchronized tactile/pose/visual inputs. Look elsewhere if you need small, easily hosted datasets — SynData’s volume count and clip counts imply substantial storage and I/O needs, and using the raw Zarr volumes requires familiarity with Zarr/Parquet workflows. The dataset is released via Hugging Face and assumes tooling that can handle large compressed archives and chunked arrays.
Where It Fits
Compared with typical egocentric video datasets, SynData adds dense hand-state and tactile telemetry; compared with lab-scale motion-capture glove datasets, it offers much larger clip counts and more varied real-world tasks. Treat it as a middle ground for teams that need precise manipulation labels at practical training scale.
