Coding agents

Use Zumik as a drop-in OpenAI-compatible endpoint for Cline, Roo Code, Continue, Aider, and the OpenAI SDKs - one base URL, one key, full request fidelity, and caching on by default.

Most coding agents already speak the OpenAI wire format and let you point them at a custom base URL. That is all Zumik needs. Set the base URL to https://api.zumik.ai/v1, use a Zumik API key, and pick a model - the agent keeps working unchanged, and you get provider prompt caching and a cheaper service tier on latency-tolerant traffic without touching the request body.

This page is the drop-in configuration. For a deeper integration that pins a stable repo policy and tool bundle as reused state on /v2, see the coding-agent example.

Three settings

Any OpenAI-compatible agent needs the same three values:

Setting	Value
Base URL	`https://api.zumik.ai/v1`
API key	your Zumik key (`zk_...`)
Model	a Zumik alias (`code.fast`, `code.balanced`, ...) or a concrete provider model id

The model field accepts either a concrete provider model id (gpt-4o, a Fireworks accounts/fireworks/models/... id) or a Zumik routing alias. An alias resolves server-side to a pinned provider release and reports the resolved release on the response headers, so a vanilla OpenAI client ignores the extra signal cleanly. The seeded aliases are code.fast, code.balanced, code.cheapest, auto.fast, auto.balanced, auto.cheapest, reasoning.best, vision.balanced, and the semantic-routing auto.semantic.

Tip

Never hardcode the key. Read it from an environment variable or your agent's secret store.

Cline

Cline accepts an OpenAI-compatible provider. In Settings -> API Provider:

Choose the provider

Set API Provider to OpenAI Compatible.

Set the base URL

Base URL: https://api.zumik.ai/v1

Set the key

API Key: your Zumik key (zk_...).

Set the model

Model ID: a Zumik alias such as code.balanced, or a concrete model id such as gpt-4o.

Roo Code uses the same OpenAI Compatible provider with the identical three fields.

Continue

Continue is configured in ~/.continue/config.yaml. Add Zumik as an OpenAI-compatible model:

~/.continue/config.yaml

models:
  - name: Zumik (code.balanced)
    provider: openai
    model: code.balanced
    apiBase: https://api.zumik.ai/v1
    apiKey: ${{ secrets.ZUMIK_API_KEY }}

provider: openai tells Continue to use the OpenAI wire format; apiBase redirects it to Zumik. The model is a Zumik alias or a concrete provider model id.

Aider and the OpenAI SDKs

Aider and the official OpenAI SDKs read the standard OpenAI environment variables, so the change is the base URL and the key.

export ZUMIK_API_KEY="zk_..."

curl

curl https://api.zumik.ai/v1/chat/completions \
  -H "Authorization: Bearer $ZUMIK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "code.balanced",
    "messages": [
      {"role": "user", "content": "Write a Python function that reverses a linked list."}
    ]
  }'

The chat endpoint is POST /v1/chat/completions. The same /v1 base URL also serves /v1/responses, /v1/embeddings, and /v1/batches for agents that use them.

Full request fidelity

Whatever your agent already sends, Zumik forwards to the resolved provider. The body shape stays exactly OpenAI's:

Function calling - tools and tool_choice pass through, so an agent's tool loop works unchanged.
Multimodal content - multi-part / structured content (text plus image parts) passes through.
Structured output - response_format (including JSON schema) passes through.
Sampling - temperature, top_p, max_tokens, stop, seed, and the rest pass through.
Streaming - stream: true returns an SSE token stream. Set stream_options.include_usage to get the trailing usage chunk.

This is the OpenAI compatibility contract: no proprietary fields in the request or response body, so a vanilla OpenAI client stays correct.

Cheaper automatically

You do not change the request to get the savings. On every eligible call Zumik engages provider-native prompt caching for you: Anthropic cache_control breakpoints on stable blocks, and an OpenAI prompt_cache_key derived from the system prompt. A latency-tolerant request can also ride a cheaper service tier, while interactive traffic is never slowed for it. Where reuse applies, the discount is reflected in the standard usage.prompt_tokens_details.cached_tokens field, so it stays measurable with a stock client.

Caching rewards a stable prefix. Keep system instructions, tools, and schema at the front of the request and push volatile content to the end. The prompt caching guide explains the per-provider mechanics.

Optional cost levers

When you want explicit control, these request-body fields are forwarded to the resolved provider when set:

Field	Values
`service_tier`	`auto`, `default`, `flex`, `scale`, `priority`
`verbosity`	`low`, `medium`, `high`
`reasoning_effort`	`minimal`, `low`, `medium`, `high`
`prompt_cache_key`	any string; pins the cache key instead of the auto-derived one

A caller-supplied service_tier always wins; leave it unset and Zumik picks a cheaper tier only for latency-tolerant work. An out-of-vocabulary value is rejected with a 400 naming the field rather than burning a provider round trip.

Model aliases

How code.fast and auto.balanced resolve to a pinned provider release.

Coding-agent example

Pin a repo policy and tool bundle as reused state on /v2.