CatBoost

Open-source gradient-boosting library from Yandex that natively handles categorical features and offers fast CPU/GPU training.

Visit Website

Introduction

Overview

CatBoost is an open-source machine-learning library developed by Yandex for gradient boosting on decision trees.
It distinguishes itself through:

Native categorical-feature support – avoids one-hot encoding by using ordered permutations.
Ordered boosting – mitigates prediction shift and overfitting.
High-performance GPU/CPU implementations – scales to large datasets and multi-GPU systems.
Rich language bindings – train in Python or R and deploy models from C++, Java, C#, Rust, Core ML, ONNX, and PMML.
Comprehensive tooling – built-in visualizations, model analysis, SHAP-based feature importance, and quantization for efficient inference.

Since its release, CatBoost has powered search ranking, recommendations, and autonomous-driving systems at Yandex and has been adopted by organizations such as CERN, Cloudflare, and JetBrains. Licensed under Apache 2.0, it is actively developed on GitHub and publishes regular releases via PyPI.

Back

Information

Websitecatboost.ai
AuthorsYandex
Published date2017/07/18

More Items

NautilusTrader

2018

Nautech Systems Pty Ltd

NautilusTrader is an open-source, high-performance event-driven algorithmic trading platform and backtester by Nautech Systems. Its Rust-based core with Python bindings provides parity between research/backtest and production/live deployments, supports multi-venue and multi-asset strategies, advanced order types, optional high-precision numeric modes, and is fast enough to be used to train AI trading agents (RL/ES).

mlops ai-train ai-development ai-library github+4

MLX Examples

2023

ml-explore (GitHub organization)

MLX Examples is an open-source repository by the ml-explore organization that provides runnable examples for the MLX framework. It includes examples across text (LLaMA, Mistral, T5, BERT, MoE, LoRA/QLoRA), image (FLUX, Stable Diffusion/SDXL, ResNets, CVAE), audio (Whisper, EnCodec, MusicGen), multimodal (CLIP, LLaVA, SAM) and other model types. The repo is intended to help developers and researchers learn MLX workflows for training, fine-tuning, generation, and inference, and links to MLX community checkpoints on Hugging Face.

github ai-framework ai-train mlops ai-image+2

fairseq

2017

Facebook AI Research (FAIR)

fairseq is an open-source sequence modeling toolkit from Facebook AI Research (FAIR), implemented in Python on top of PyTorch. It provides reference implementations for a wide range of sequence models (Transformer, LSTM, Conv, wav2vec, wav2vec 2.0, etc.) and supports tasks such as machine translation, summarization, language modeling, and speech processing. Key features include multi-GPU and distributed training, fast generation (beam search, sampling, diverse beam), mixed-precision training, parameter/optimizer sharding, and many pre-trained models and examples. The project is MIT-licensed and documented at readthedocs.

github NLP ASR audio translation+2