JAX is a high-performance Python library that brings just-in-time compilation, automatic differentiation and easy parallelism to NumPy-style array programming.
NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.
End-to-end NVIDIA framework and micro-services platform for building, customizing, and deploying large language, speech, vision, and multimodal AI models.
Open-source, high-performance server for deploying and scaling AI/ML models on GPUs or CPUs, supporting multiple frameworks and cloud/edge targets.
NVIDIA’s model-parallel training library for GPT-like transformers at multi-billion-parameter scale.
NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.
NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.