High-performance, scalable gradient-boosted decision tree library for regression, classification, ranking and custom objectives. Multi-language bindings (Python, R, Java, Scala, C++), single-node, distributed and GPU training — widely used for tabular data and ML competitions.
Provides research-grade implementations and pretrained models for sequence tasks (translation, LM, speech). Offers multi-GPU training, fast generation (beam/sampling/lexical constraints), mixed-precision, and state sharding — aimed at researchers reproducing or extending papers.
Orchestrates and scales Python-based AI/ML workloads from laptop to thousands of GPUs — exposing task and actor primitives plus high-level libraries for training, hyperparameter tuning, serving, RL, and data processing. Designed for heterogeneous accelerators and production ML pipelines.