AIAny - nvidia

Jax

2019

Google, Nvidia

JAX is a high-performance Python library that brings just-in-time compilation, automatic differentiation and easy parallelism to NumPy-style array programming.

ai-development nvidia google ai-framework ai-train

TensorRT

2016

NVIDIA

NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.

ai-development ai-library ai-inference ai-serving nvidia

SGLang is a high-performance serving framework for large language models (LLMs) and vision-language models, designed for low-latency and high-throughput inference across single GPUs to large distributed clusters. Key features include RadixAttention for prefix caching, zero-overhead batch scheduling, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, and quantization (FP4/FP8/INT4/AWQ/GPTQ). It supports a wide range of models like Llama, Qwen, DeepSeek, and hardware from NVIDIA, AMD, Intel, TPUs, with an intuitive frontend for LLM applications.

llm ai-serving ai-inference nvidia pytorch+3

cuTile Python

2024

NVIDIA CORPORATION

cuTile Python is a programming model for writing parallel kernels for NVIDIA GPUs, built primarily in Python with a C++ extension. It enables efficient GPU programming, requires CUDA Toolkit 13.1+, and supports installation via PyPI or from source.

nvidia ai-development ai-library ai-train

NVIDIA NeMo

2019

NVIDIA

End-to-end NVIDIA framework and micro-services platform for building, customizing, and deploying large language, speech, vision, and multimodal AI models.

ai-development ai-library ai-train LLM nvidia

Triton

2018

NVIDIA

Open-source, high-performance server for deploying and scaling AI/ML models on GPUs or CPUs, supporting multiple frameworks and cloud/edge targets.

ai-development ai-inference ai-serving nvidia

Megatron-LM

2019

NVIDIA

NVIDIA’s model-parallel training library for GPT-like transformers at multi-billion-parameter scale.

ai-development ai-framework ai-train nvidia

TensorRT-LLM

2023

NVIDIA

NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.

ai-development ai-inference ai-serving nvidia

NVIDIA Dynamo

2025

NVIDIA

NVIDIA Dynamo is an open-source, high-throughput, low-latency inference framework that scales generative-AI and reasoning models across large, multi-node GPU clusters.

ai-development ai-inference ai-serving nvidia

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

Jax

TensorRT