TensorRT-LLM

NVIDIA’s open-source library that compiles Transformer blocks into highly-optimized TensorRT engines for blazing-fast LLM inference on NVIDIA GPUs.

Visit Website

Introduction

Overview

TensorRT-LLM accelerates large-language-model inference by generating TensorRT engines with custom attention kernels, paged-KV caching, quantization (FP8/FP4/INT4/INT8) and speculative decoding.

Key Capabilities

Automatic engine generation from PyTorch checkpoints
In-flight batching and look-ahead decoding for high throughput
Multi-GPU / multi-node readiness via Triton back-end
Python & C++ runtimes with OpenAI-style API

Back

Information

Websitedocs.nvidia.com
AuthorsNVIDIA
Published date2023/10/19

More Items

ONNX

2017

ONNX Project Contributors, Meta (Facebook) +1

ONNX (Open Neural Network Exchange) is an open ecosystem that provides an open source format for AI models, including deep learning and traditional ML. It defines an extensible computation graph model, built-in operators, and standard data types, focusing on inferencing capabilities. Widely supported across frameworks and hardware, it enables interoperability and accelerates AI innovation.

ai-framework mlops ai-inference ai-serving pytorch+2

LightX2V

2025

LightX2V Contributors, ModelTC

LightX2V is an advanced lightweight video generation inference framework engineered to deliver efficient, high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques, supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). X2V represents the transformation of different input modalities (X, such as text or images) into video output (V).

github ai-video ai-tools ai-inference huggingface+2

Streamlit

2018

Adrien Treuille, Thiago Teixeira +1

Streamlit is an open-source app framework that turns Python scripts into shareable web apps in minutes. It enables data scientists and AI/ML engineers to build interactive data apps like dashboards, reports, or chat apps using pure Python, without front-end experience.

github ai-tools ai-client mlops ai-development+1