Search
Collection
Category
Tag
Blog

AIAny

Learn Anything about AI in one site.

support@aiany.app

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

Copyright © 2025 All Rights Reserved.

Home
Category
AI Deploy
vLLM

vLLM

vLLM is a high-throughput, memory-efficient inference and serving engine for large language models (LLMs), built to deliver state-of-the-art performance on GPUs with features such as PagedAttention and continuous batching.

Introduction

Oops! Something went wrong

[next-mdx-remote-client] error compiling MDX: Unexpected character `4` (U+0034) before name, expected a character that can start a name, such as a letter, `$`, or `_` More information: https://mdxjs.com/docs/troubleshooting-mdx

Information

Websitedocs.vllm.ai
AuthorsWoosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joey Gonzalez, Hao Zhang, Ion Stoica
Published date2023/06/20

Categories

AI Deploy

Tags

ai-development
ai-library
ai-inference
ai-serving

More Items

SGLang

2024

LMSYS Org

SGLang is a high-performance serving framework for large language models (LLMs) and vision-language models, designed for low-latency and high-throughput inference across single GPUs to large distributed clusters. Key features include RadixAttention for prefix caching, zero-overhead batch scheduling, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, and quantization (FP4/FP8/INT4/AWQ/GPTQ). It supports a wide range of models like Llama, Qwen, DeepSeek, and hardware from NVIDIA, AMD, Intel, TPUs, with an intuitive frontend for LLM applications.

llm ai-serving ai-inference nvidia pytorch+3

stable-diffusion.cpp

2023

leejet

stable-diffusion.cpp is a pure C/C++ implementation for diffusion model inference, based on ggml, supporting models like Stable Diffusion (SD1.x, SD2.x, SDXL), Flux, Wan, Qwen Image, Z-Image, and more. It's lightweight with no external dependencies, supports backends like CPU, CUDA, Vulkan, Metal, and features like LoRA, ControlNet, LCM for efficient local image generation on platforms including Linux, Mac, Windows, and Android.

github ai-image AIGC ai-inference ai-client

Cloudflare Vibe SDK

2024

Cloudflare

Cloudflare VibeSDK is an open-source full-stack AI webapp generator built on Cloudflare's developer platform. It enables users to describe apps in natural language, with AI agents creating and deploying React + TypeScript + Tailwind applications. Key features include phase-wise code generation, live previews in sandboxed containers, interactive chat, and one-click deployment to Workers for Platforms.

ai-coding ai-agent ai-development LLM