Overview
CatBoost is an open-source machine-learning library developed by Yandex for gradient boosting on decision trees.
It distinguishes itself through:
- Native categorical-feature support – avoids one-hot encoding by using ordered permutations.
- Ordered boosting – mitigates prediction shift and overfitting.
- High-performance GPU/CPU implementations – scales to large datasets and multi-GPU systems.
- Rich language bindings – train in Python or R and deploy models from C++, Java, C#, Rust, Core ML, ONNX, and PMML.
- Comprehensive tooling – built-in visualizations, model analysis, SHAP-based feature importance, and quantization for efficient inference.
Since its release, CatBoost has powered search ranking, recommendations, and autonomous-driving systems at Yandex and has been adopted by organizations such as CERN, Cloudflare, and JetBrains. Licensed under Apache 2.0, it is actively developed on GitHub and publishes regular releases via PyPI.