AI Deploy2021

KServe

Serves predictive and generative ML models on Kubernetes via a single InferenceService CRD, with scale-to-zero, canary rollouts, and an OpenAI-compatible LLM path on vLLM. One autoscaling abstraction over PyTorch, XGBoost, ONNX, and HuggingFace.

Visit Website

Introduction

Every team eventually rebuilds the same plumbing around a model: an autoscaler, a router, health checks, a metrics sidecar, a way to shift 5% of traffic to a new version. KServe's bet is that this plumbing is a solved, declarative problem — you describe the desired serving state in one InferenceService resource and the controller reconciles the rest, the same way Deployments removed the need to script rolling updates by hand.

What Sets It Apart

One CRD spans the whole spectrum: classic predictors (scikit-learn, XGBoost, ONNX) and LLMs sit behind the same spec, so the operational surface doesn't fork as you add generative workloads.
The generative path exposes OpenAI-compatible endpoints over vLLM/llm-d, meaning existing client code points at a self-hosted model with a URL swap rather than an SDK rewrite.
Serverless underpinnings give true scale-to-zero, so idle models cost nothing — a real difference when you run dozens of low-traffic endpoints.
InferenceGraph lets you wire transformers, predictors, and ensembles into a DAG declaratively, instead of gluing services together in application code.

Who It's For

Great fit if you already run Kubernetes and want canary deploys, drift detection, and autoscaling without hand-rolling each piece, or want one platform covering both tabular models and LLMs. Look elsewhere if you have no cluster and just need a model behind an API — the Knative/Istio dependencies and CRD surface are real operational weight, and a managed endpoint will get you there faster.

Back

Information

Websitekserve.github.io
OrganizationsGoogle, IBM, Bloomberg, NVIDIA, Seldon, Cloud Native Computing Foundation
AuthorsKServe community
Published date2021/09/27

More Items

AI Deploy2026

Openship

Deploy and manage applications and containers to your own servers or Openship Cloud from a single desktop, web, or CLI interface. Built-in CI/CD with push-to-deploy and preview environments, automatic SSL, managed databases, CDN, backups, and multi-node portability for VPS-to-production workflows.

ai-deploy mLOps mcp docker cli+5

AI API2026

CPA Manager Plus

seakee

Self-hosted CPA / CLIProxyAPI management and observability panel that stores request history, tracks cost/usage/quota, and centralizes provider/credential/OAuth and plugin management. Designed for local analytics, failure diagnosis and account automation without telemetry.

ai-api-management mLOps docker sqlite go+9

Reinforcement Learning Papers2026

LongStraw: Long-Context RL Beyond 2M Tokens under a Fixed GPU Budget

Changhai Zhou, Kieran Liu +18

Enables RL post-training with million-token prompts under a fixed GPU budget by evaluating shared prompt state without autograd, retaining only minimal model state, and replaying short response branches; instantiated as GRPO and demonstrated on Qwen3.6-27B and GLM-5.2 up to multi-million token execution.

RL llm qwen mLOps ai-train+1