LogoAIAny
Icon for item

SGLang

Open-source high-performance framework and DSL for serving large language & vision-language models with low-latency, controllable, structured generation.

Introduction

What is SGLang?

SGLang is an open-source serving engine and Structured Generation Language created by the LMSYS team to turbo-charge inference for large language models (LLMs) and vision-language models. By co-designing a fast backend runtime with a concise, Python-like frontend DSL, SGLang lets developers build multi-step, parallel, and structured generation pipelines while sustaining state-of-the-art throughput.

Key capabilities
  • RadixAttention & KV-cache reuse for efficient prefill/decoding
  • Continuous batching, speculative decoding, quantization (FP8/INT4/AWQ/GPTQ)
  • Prefill–decode disaggregation & expert parallelism to scale across GPUs
  • Frontend language primitives for control flow, tool/function calls, JSON/AST output, and multimodal inputs
  • Broad model support (Llama-3/4, DeepSeek, Mistral, Qwen, LLaVA, etc.) and OpenAI-style API compatibility
Who is it for?

• Engineers building low-latency chat, RAG, or agent systems • Researchers needing reproducible, high-throughput benchmarks • Platform teams seeking a production-grade, vendor-neutral inference stack

Released under the Apache-2.0 license, SGLang is now part of the PyTorch Ecosystem and powers trillions of tokens per day in production systems.

Information

  • Websitedocs.sglang.ai
  • AuthorsLianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
  • Published date2023/12/12

Categories