Most Claude Code users are tied to Anthropic’s API for backend inference; this project flips that dependency by acting as a lightweight Anthropic-compatible proxy that forwards Claude Code requests to alternative providers (NVIDIA NIM, OpenRouter), or to fully local backends (LM Studio, llama.cpp). The core insight is pragmatic: preserve Claude Code’s client UX while swapping the expensive/remote Anthropic endpoint for cheaper or local inference backends.
What Sets It Apart
-
Per-model provider mapping: route Opus/Sonnet/Haiku requests to different backends so each Claude model can use the provider best suited for its capabilities. So what: you can mix local and hosted models in one session without changing Claude Code itself.
-
Multiple provider options (NVIDIA NIM, OpenRouter, LM Studio, llama.cpp): choose a free hosted tier, a large model catalog, or fully local inference. So what: you trade off latency, privacy, and cost with a simple env var configuration.
-
Claude-native features preserved: parses
<think>tokens and reasoning_content into native Claude thinking blocks, and converts Anthropic-format streaming to OpenAI-compatible streams. So what: many Claude-specific behaviors (thinking tokens, tool-like outputs) remain usable even when the backend changes. -
Request optimization and smart rate limiting: local short-circuiting of trivial requests and rolling-window throttles reduce provider quota use. So what: common metadata calls and probes won’t quickly exhaust free-rate quotas.
-
Messaging platform support: built-in Discord/Telegram bot and session persistence let you control Claude Code remotely. So what: useful for remote demos, testing, or lightweight automation without a separate orchestration layer.
Who It's For & Trade-offs
Great fit if you want to run Claude Code against free or local models (experimentation, privacy, cost reduction), need a drop-in replacement without modifying the client, or want multi-provider routing for model diversity.
Look elsewhere if you need guaranteed parity with Anthropic’s hosted Claude in every edge case, or if you require an officially supported Anthropic integration: some advanced Claude features or billing/usage semantics may differ across providers. Also note provider-specific limits (e.g., NVIDIA NIM free-tier rate caps), local model hardware requirements, and potential behavioral differences when translating formats between Anthropic and OpenAI-compatible endpoints.
Practical notes: the proxy is intentionally minimal and config-driven (env vars for provider keys and model mappings). It reduces friction for developers but relies on the chosen provider’s stability and policy terms; evaluate provider quotas, latency, and any acceptable-use constraints before production deployment.
