Runs LLMs, vision, audio and multimodal models locally with an OpenAI-compatible API, supporting CPU-only and GPU acceleration across 35+ backends. Includes built-in agents, multi-user access controls, a model gallery, and privacy-first local inference.