LogoAIAny
Icon for item

Text-Generation-Inference

Hugging Face’s Rust + Python server for high-throughput, multi-GPU text generation.

Introduction

Overview

TGI shards decoder weights across GPUs, streams tokens over SSE/GRPC and exposes an OpenAI-style /generate route.

Key Capabilities
  • Tensor / pipeline parallel & quantization
  • KV-cache, speculative decoding, vLLM sampler
  • Prometheus + Jaeger telemetry hooks

Information

Categories