LogoAIAny
Icon for item

genrobot2025/10Kh-RealOmin-OpenData

Large-scale, real-world dual-arm video corpus for embodied robotics and reinforcement-learning research — over 1TB of multimodal recordings on Hugging Face, intended for training and evaluating agents in real manipulation scenarios; CC BY‑SA 4.0.

Introduction

Large real-world video corpora for dual-arm manipulation and embodied reinforcement learning remain rare. This dataset provides a multi-terabyte collection of real-world, dual-arm video recordings hosted on Hugging Face, offering researchers a resource focused on real manipulation dynamics, agent-environment interaction, and downstream RL/behavioral benchmarks. The dataset has accrued substantial community interest (downloads: 81,392; likes: 211) and is updated as of 2026-04-24.

What Sets It Apart
  • Real-world dual-arm focus: tagged and curated for dual-arm/robotic manipulation scenarios, making it directly relevant for embodied agent research rather than synthetic or single-arm benchmarks.
  • Scale and modality: classified as >1TB of video data with multimodal recordings (video-first), enabling training of vision-based policies, video-language-action (VLA) models, and long-horizon behavior learning.
  • Open licensing and discoverability: hosted on Hugging Face under CC BY‑SA 4.0 (per dataset tags), which supports academic reuse with share‑alike requirements and simplifies community access and tooling integration.
Who it's for and tradeoffs

Great fit if you are an embodied-AI or robotics lab building vision-based control or RL systems that need extensive real footage of manipulation and dual-arm interactions. It’s also useful for benchmarking sim-to-real transfer and training video-language-action research. Look elsewhere if you need finely annotated per-frame poses/segmentation (the dataset card should be consulted for annotation detail), require a permissive commercial license without share-alike constraints, or need primarily simulated/synthetic data for controlled physics studies.

Where it fits

This dataset occupies the niche between smaller, highly annotated robotics datasets and large-but-synthetic simulated corpora: it trades some annotation granularity for scale and realistic sensor noise, making it particularly valuable for research on robustness, long-horizon planning, and embodied multimodal models.

Information

Categories