A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.
Programmatically author, schedule, and monitor data workflows using Python-defined DAGs. Features modular executors, rich provider/operator ecosystem (Kubernetes, AWS, GCP), and built-in scheduling/monitoring for batch and ML pipelines.
ONNX (Open Neural Network Exchange) is an open ecosystem that provides an open source format for AI models, including deep learning and traditional ML. It defines an extensible computation graph model, built-in operators, and standard data types, focusing on inferencing capabilities. Widely supported across frameworks and hardware, it enables interoperability and accelerates AI innovation.
Orchestrates and scales Python-based AI/ML workloads from laptop to thousands of GPUs — exposing task and actor primitives plus high-level libraries for training, hyperparameter tuning, serving, RL, and data processing. Designed for heterogeneous accelerators and production ML pipelines.
Stores and indexes high-dimensional embeddings for scalable ANN vector search. Distributed, Kubernetes-native architecture with multiple index types, GPU acceleration, and hybrid dense+sparse search — suitable for semantic search, RAG, and recommendation pipelines.