AIAny - AI Deploy

vLLM

2023

Woosuk Kwon, Zhuohan Li +7

vLLM is a high-throughput, memory-efficient inference and serving engine for large language models (LLMs), built to deliver state-of-the-art performance on GPUs with features such as PagedAttention and continuous batching.

ai-development ai-library ai-inference ai-serving

KTransformers

2024

MADSys Lab, Tsinghua University, Approaching.AI +17

KTransformers is a flexible framework for experiencing cutting-edge optimizations in LLM inference and fine-tuning, focusing on CPU-GPU heterogeneous computing. It consists of two core modules: kt-kernel for high-performance inference kernels and kt-sft for fine-tuning. The project supports various hardware and models like DeepSeek series, Kimi-K2, achieving significant resource savings and speedups, such as reducing GPU memory for a 671B model to 70GB and up to 28x acceleration.

github llm ai-inference ai-train ai-framework+3

SGLang

2024

LMSYS Org

SGLang is a high-performance serving framework for large language models (LLMs) and vision-language models, designed for low-latency and high-throughput inference across single GPUs to large distributed clusters. Key features include RadixAttention for prefix caching, zero-overhead batch scheduling, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, and quantization (FP4/FP8/INT4/AWQ/GPTQ). It supports a wide range of models like Llama, Qwen, DeepSeek, and hardware from NVIDIA, AMD, Intel, TPUs, with an intuitive frontend for LLM applications.

llm ai-serving ai-inference nvidia pytorch+3

Ollama

2023

Jeffrey Morgan, Michael Chiang

A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.

ai-development ai-library ai-inference ai-serving LLM

TensorFlow Serving

2016

Google

An open-source, production-ready system for serving machine-learning models at scale.

ai-development ai-library ai-inference ai-serving google

TensorRT

2016

NVIDIA

NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.

ai-development ai-library ai-inference ai-serving nvidia

ONNX

2017

ONNX Project Contributors, Meta (Facebook) +1

ONNX (Open Neural Network Exchange) is an open ecosystem that provides an open source format for AI models, including deep learning and traditional ML. It defines an extensible computation graph model, built-in operators, and standard data types, focusing on inferencing capabilities. Widely supported across frameworks and hardware, it enables interoperability and accelerates AI innovation.

ai-framework mlops ai-inference ai-serving pytorch+2

Streamlit

2018

Adrien Treuille, Thiago Teixeira +1

Streamlit is an open-source app framework that turns Python scripts into shareable web apps in minutes. It enables data scientists and AI/ML engineers to build interactive data apps like dashboards, reports, or chat apps using pure Python, without front-end experience.

github ai-tools ai-client mlops ai-development+1

SkyPilot

2021

skypilot-org, Sky Computing Lab (UC Berkeley)

SkyPilot is an open-source MLOps / AI infrastructure project that provides a unified control plane and CLI to run, manage, and scale AI workloads on any compute — Kubernetes, Slurm, 20+ clouds, or on-prem clusters. It supports job-as-code (YAML/Python), intelligent scheduling and cost optimization (spot instances, autostop), automatic setup/sync of environments, auto-recovery, and integrations for training, serving and inference workflows.

mlops ai-serving ai-train ai-workflow ai-inference+2

ExecuTorch

2022

PyTorch

ExecuTorch is PyTorch’s unified on-device AI deployment solution for mobile, embedded, and edge devices. It enables direct export from PyTorch, ahead-of-time compilation, quantization and hardware partitioning to produce compact runtime programs (.pte) that run across many backends (XNNPACK, Vulkan, CoreML, Qualcomm, etc.). It supports LLMs, vision, speech and multimodal models with a small runtime footprint and production tools for profiling, memory planning, and selective operator builds.

pytorch ai-inference ai-serving ai-framework mlops+3

FunASR

2022

Alibaba DAMO Academy, Northwestern Polytechnical University (NWPU) +5

FunASR is an open-source end-to-end speech recognition toolkit (ASR) led by Alibaba DAMO Academy. It supports ASR, voice activity detection (VAD), punctuation restoration, speaker verification/diarization, multi-talker ASR, emotion recognition and more. FunASR provides many industrial-grade pretrained models, inference scripts, and deployment runtimes for research and production use.

ASR audio pytorch ai-library huggingface+4

Pathway

2022

pathwaycom (Pathway team)

Pathway is a Python ETL and live data framework combining a user-friendly Python API with a high-performance Rust engine. It supports both batch and streaming pipelines, stateful transformations, persistence, numerous connectors, and includes LLM/RAG tooling for real-time analytics and live LLM pipelines.

github mlops ai-framework LLM RAG+4

Category

Explore by categories

All

AI Leaderboard

AI Agent Tutorials

AI Coding Tutorials

AI Agent Papers

Chatbot

Machine Learning Foundation Books

AI Train

AI Deploy

AI Client

Machine Learning Foundation Papers

Machine Learning Foundation Tutorials

AI Image Demos

AI Agent

Large Language Model Tutorials

Large Language Model Papers

Machine Learning Engineering Papers

Computer Vision Tutorials

Computer Vision Papers

Natural Language Processing Papers

Reinforcement Learning Papers

Speech Technology Papers

AI API

AI Coding

AI Image

AI Video

MLOps

MCP Client

MCP Server

AI Video Papers

AI Audio

AI Infra

Embodied AI

vLLM

KTransformers

SGLang

Ollama

TensorFlow Serving

TensorRT

ONNX

Streamlit

SkyPilot

ExecuTorch

FunASR

Pathway