AIAny - ai-serving

vLLM

2023

Woosuk Kwon, Zhuohan Li +7

vLLM is a high-throughput, memory-efficient inference and serving engine for large language models (LLMs), built to deliver state-of-the-art performance on GPUs with features such as PagedAttention and continuous batching.

ai-development ai-library ai-inference ai-serving

KTransformers

2024

MADSys Lab, Tsinghua University, Approaching.AI +17

KTransformers is a flexible framework for experiencing cutting-edge optimizations in LLM inference and fine-tuning, focusing on CPU-GPU heterogeneous computing. It consists of two core modules: kt-kernel for high-performance inference kernels and kt-sft for fine-tuning. The project supports various hardware and models like DeepSeek series, Kimi-K2, achieving significant resource savings and speedups, such as reducing GPU memory for a 671B model to 70GB and up to 28x acceleration.

github llm ai-inference ai-train ai-framework+3

SGLang

2024

LMSYS Org

SGLang is a high-performance serving framework for large language models (LLMs) and vision-language models, designed for low-latency and high-throughput inference across single GPUs to large distributed clusters. Key features include RadixAttention for prefix caching, zero-overhead batch scheduling, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, and quantization (FP4/FP8/INT4/AWQ/GPTQ). It supports a wide range of models like Llama, Qwen, DeepSeek, and hardware from NVIDIA, AMD, Intel, TPUs, with an intuitive frontend for LLM applications.

llm ai-serving ai-inference nvidia pytorch+3

Ollama

2023

Jeffrey Morgan, Michael Chiang

A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.

ai-development ai-library ai-inference ai-serving LLM

TensorFlow Serving

2016

Google

An open-source, production-ready system for serving machine-learning models at scale.

ai-development ai-library ai-inference ai-serving google

TensorRT

2016

NVIDIA

NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.

ai-development ai-library ai-inference ai-serving nvidia

ONNX

2017

ONNX Project Contributors, Meta (Facebook) +1

ONNX (Open Neural Network Exchange) is an open ecosystem that provides an open source format for AI models, including deep learning and traditional ML. It defines an extensible computation graph model, built-in operators, and standard data types, focusing on inferencing capabilities. Widely supported across frameworks and hardware, it enables interoperability and accelerates AI innovation.

ai-framework mlops ai-inference ai-serving pytorch+2

NautilusTrader

2018

Nautech Systems Pty Ltd

NautilusTrader is an open-source, high-performance event-driven algorithmic trading platform and backtester by Nautech Systems. Its Rust-based core with Python bindings provides parity between research/backtest and production/live deployments, supports multi-venue and multi-asset strategies, advanced order types, optional high-precision numeric modes, and is fast enough to be used to train AI trading agents (RL/ES).

mlops ai-train ai-development ai-library github+4

PaddleOCR

2019

PaddlePaddle

PaddleOCR is an industry-leading, production-ready OCR and document AI engine developed by the PaddlePaddle team. It supports over 100 languages and converts PDFs or image documents into structured AI-friendly data (e.g., JSON and Markdown), bridging the gap between images/PDFs and LLMs. Key features include multilingual support, high accuracy, handwriting recognition, and advanced document parsing for elements like tables, formulas, and charts, with end-to-end tools for training, inference, and deployment.

github ai-tools ai-image vision ai-inference+2

YOLOv5

2020

Ultralytics

YOLOv5 is an open-source PyTorch-based computer vision repository by Ultralytics, focused on real-time object detection and extended support for segmentation and classification. It is known for ease of use, speed, multiple pre-trained model sizes, and broad export/deployment support (ONNX, TFLite, CoreML, TensorRT). The repo includes training, inference scripts, tutorials, and integrations for production-ready workflows.

vision github ai-image ai-inference ai-serving+1

SkyPilot

2021

skypilot-org, Sky Computing Lab (UC Berkeley)

SkyPilot is an open-source MLOps / AI infrastructure project that provides a unified control plane and CLI to run, manage, and scale AI workloads on any compute — Kubernetes, Slurm, 20+ clouds, or on-prem clusters. It supports job-as-code (YAML/Python), intelligent scheduling and cost optimization (spot instances, autostop), automatic setup/sync of environments, auto-recovery, and integrations for training, serving and inference workflows.

mlops ai-serving ai-train ai-workflow ai-inference+2

ExecuTorch

2022

PyTorch

ExecuTorch is PyTorch’s unified on-device AI deployment solution for mobile, embedded, and edge devices. It enables direct export from PyTorch, ahead-of-time compilation, quantization and hardware partitioning to produce compact runtime programs (.pte) that run across many backends (XNNPACK, Vulkan, CoreML, Qualcomm, etc.). It supports LLMs, vision, speech and multimodal models with a small runtime footprint and production tools for profiling, memory planning, and selective operator builds.

pytorch ai-inference ai-serving ai-framework mlops+3

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

claude-code

course

deepmind

deepseek

engineering

finance

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ocr

ollama

openai

paper

physics

plugin

pytorch

RL

robotics

science

security

sora

translation

tutorial

vibe-coding

video