Coding agents
Use Zumik as a drop-in OpenAI-compatible endpoint for Cline, Roo Code, Continue, Aider, and the OpenAI SDKs - one base URL, one key, full request fidelity, and caching on by default.
Most coding agents already speak the OpenAI wire format and let you point them at a custom base URL. That is all Zumik needs. Set the base URL to https://api.zumik.ai/v1, use a Zumik API key, and pick a model - the agent keeps working unchanged, and you get provider prompt caching and a cheaper service tier on latency-tolerant traffic without touching the request body.
This page is the drop-in configuration. For a deeper integration that pins a stable repo policy and tool bundle as reused state on /v2, see the coding-agent example.
Three settings
Any OpenAI-compatible agent needs the same three values:
| Setting | Value |
|---|---|
| Base URL | https://api.zumik.ai/v1 |
| API key | your Zumik key (zk_...) |
| Model | a Zumik alias (code.fast, code.balanced, ...) or a concrete provider model id |
The model field accepts either a concrete provider model id (gpt-4o, a Fireworks accounts/fireworks/models/... id) or a Zumik routing alias. An alias resolves server-side to a pinned provider release and reports the resolved release on the response headers, so a vanilla OpenAI client ignores the extra signal cleanly. The seeded aliases are code.fast, code.balanced, code.cheapest, auto.fast, auto.balanced, auto.cheapest, reasoning.best, vision.balanced, and the semantic-routing auto.semantic.
Tip
Never hardcode the key. Read it from an environment variable or your agent's secret store.
Cline
Cline accepts an OpenAI-compatible provider. In Settings -> API Provider:
Choose the provider
Set API Provider to OpenAI Compatible.
Set the base URL
Base URL: https://api.zumik.ai/v1
Set the key
API Key: your Zumik key (zk_...).
Set the model
Model ID: a Zumik alias such as code.balanced, or a concrete model id such as gpt-4o.
Roo Code uses the same OpenAI Compatible provider with the identical three fields.
Continue
Continue is configured in ~/.continue/config.yaml. Add Zumik as an OpenAI-compatible model:
models:
- name: Zumik (code.balanced)
provider: openai
model: code.balanced
apiBase: https://api.zumik.ai/v1
apiKey: ${{ secrets.ZUMIK_API_KEY }}provider: openai tells Continue to use the OpenAI wire format; apiBase redirects it to Zumik. The model is a Zumik alias or a concrete provider model id.
Aider and the OpenAI SDKs
Aider and the official OpenAI SDKs read the standard OpenAI environment variables, so the change is the base URL and the key.
export ZUMIK_API_KEY="zk_..."curl https://api.zumik.ai/v1/chat/completions \
-H "Authorization: Bearer $ZUMIK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "code.balanced",
"messages": [
{"role": "user", "content": "Write a Python function that reverses a linked list."}
]
}'The chat endpoint is POST /v1/chat/completions. The same /v1 base URL also serves /v1/responses, /v1/embeddings, and /v1/batches for agents that use them.
Full request fidelity
Whatever your agent already sends, Zumik forwards to the resolved provider. The body shape stays exactly OpenAI's:
- Function calling -
toolsandtool_choicepass through, so an agent's tool loop works unchanged. - Multimodal content - multi-part / structured
content(text plus image parts) passes through. - Structured output -
response_format(including JSON schema) passes through. - Sampling -
temperature,top_p,max_tokens,stop,seed, and the rest pass through. - Streaming -
stream: truereturns an SSE token stream. Setstream_options.include_usageto get the trailing usage chunk.
This is the OpenAI compatibility contract: no proprietary fields in the request or response body, so a vanilla OpenAI client stays correct.
Cheaper automatically
You do not change the request to get the savings. On every eligible call Zumik engages provider-native prompt caching for you: Anthropic cache_control breakpoints on stable blocks, and an OpenAI prompt_cache_key derived from the system prompt. A latency-tolerant request can also ride a cheaper service tier, while interactive traffic is never slowed for it. Where reuse applies, the discount is reflected in the standard usage.prompt_tokens_details.cached_tokens field, so it stays measurable with a stock client.
Caching rewards a stable prefix. Keep system instructions, tools, and schema at the front of the request and push volatile content to the end. The prompt caching guide explains the per-provider mechanics.
Optional cost levers
When you want explicit control, these request-body fields are forwarded to the resolved provider when set:
| Field | Values |
|---|---|
service_tier | auto, default, flex, scale, priority |
verbosity | low, medium, high |
reasoning_effort | minimal, low, medium, high |
prompt_cache_key | any string; pins the cache key instead of the auto-derived one |
A caller-supplied service_tier always wins; leave it unset and Zumik picks a cheaper tier only for latency-tolerant work. An out-of-vocabulary value is rejected with a 400 naming the field rather than burning a provider round trip.
Prompt linter
zumik lint and the web prompt-linter check a prompt's layout for the structure that defeats provider-native caching - volatile content in the stable prefix, bad ordering, and a sub-1024-token prefix.
LangChain
ChatZumik and ZumikEmbeddings for LangChain (Python and JavaScript) - thin configurations of the LangChain OpenAI wrappers pinned to Zumik's /v1 surface.