Overview
ColossalAI unifies tensor, pipeline, sequence and data parallelism with automatic 3-D parallel strategy search, delivering near-linear scaling on multi-GPU clusters while minimizing memory footprint via the Gemini memory manager.
Key Capabilities
- ZeRO, Gemini & chunk-based memory optimization
- Hybrid (3-D) parallelism with automatic planner
- FlashAttention, fused kernels and BF16/FP8 support
- CLI & Profiler for job orchestration and monitoring
- Seamless DeepSpeed compatibility