Provides a consistent Python library of classic ML algorithms, preprocessing, model selection, and evaluation tools. Emphasizes a uniform estimator API, Pipelines, and tight NumPy/SciPy integration—suited for teaching and rapid prototyping of tabular and small-to-medium workloads.
Provides APIs to build, learn, and run Bayesian and dynamic Bayesian networks, perform probabilistic inference, and compute interventional/counterfactual queries. Ships example notebooks, tutorials, and PyPI/conda packages. ([github.com](https://github.com/pgmpy/pgmpy))
High-performance, scalable gradient-boosted decision tree library for regression, classification, ranking and custom objectives. Multi-language bindings (Python, R, Java, Scala, C++), single-node, distributed and GPU training — widely used for tabular data and ML competitions.
Programmatically author, schedule, and monitor data workflows using Python-defined DAGs. Features modular executors, rich provider/operator ecosystem (Kubernetes, AWS, GCP), and built-in scheduling/monitoring for batch and ML pipelines.
Provides a Python-native, open-source deep learning framework with dynamic (eager) computation graphs, GPU acceleration, and a large ecosystem of libraries and pre-trained models — widely used for research and production. ([github.com](https://github.com/pytorch/pytorch?utm_source=openai))
Provides research-grade implementations and pretrained models for sequence tasks (translation, LM, speech). Offers multi-GPU training, fast generation (beam/sampling/lexical constraints), mixed-precision, and state sharding — aimed at researchers reproducing or extending papers.
Orchestrates and scales Python-based AI/ML workloads from laptop to thousands of GPUs — exposing task and actor primitives plus high-level libraries for training, hyperparameter tuning, serving, RL, and data processing. Designed for heterogeneous accelerators and production ML pipelines.