Generates short videos from text, images, or videos and ships a full training/inference pipeline with checkpoints and demos. Key features include multi-stage training (VAE / 3D-VAE), rectified-flow training, video compression modules, and support for 2s–16s clips at up to 720p. Best for researchers and engineers who can provide substantial GPU resources.