Zumik
Guides

OpenAI migration guide

Point an existing OpenAI client at Zumik with no body changes - the full /v1 compatibility surface, the base-url swap, what carries over unchanged, the optional Agent-* headers, and a testing checklist.

Zumik's /v1 is a strict OpenAI-compatible surface. Swap the base URL and your existing client keeps working: the request and response JSON stays exactly OpenAI-shaped. Everything proprietary - reuse signals, execution profile, QoS outcome, idempotent-replay markers - rides on optional Agent-* headers, never in the body. So a stock OpenAI SDK reads a Zumik response the same way it reads an OpenAI one, and the extra signal is there for clients that look for it.

The base-url swap

Set the base URL to https://api.zumik.ai/v1 and use a Zumik API key as the bearer token. No other code changes are required.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.zumik.ai/v1",
    api_key="zk_live_...",  # a Zumik key, not an OpenAI key
)

r = client.chat.completions.create(
    model="gpt-4o",  # or a Zumik alias like code.balanced
    messages=[{"role": "user", "content": "Summarize the failing test."}],
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens_details.cached_tokens)  # reuse stays visible

The API key is a Zumik key (zk_live_…), not your provider key. To call providers with your own credential instead, attach a BYOK key - the base url and body stay the same.

The compatibility surface

These /v1 endpoints are live today and accept the exact OpenAI request shapes:

EndpointNotes
POST /v1/chat/completionsChat Completions, including stream and stream_options.include_usage. See Streaming.
POST /v1/responsesResponses API. input accepts a bare string or the structured input-item array.
GET /v1/responses/{id}Retrieve a stored response.
POST /v1/responses/{id}/cancelCancel an in-flight response.
POST /v1/responses/input_tokensCount input tokens without generating.
POST /v1/embeddingsEmbeddings, OpenAI request and response shape.
POST /v1/batchesBatch jobs over an uploaded input file. See the Files API.
GET /v1/modelsReturns the minimal OpenAI model-object shape (id, object, created, owned_by).
GET /v1/models/{id}Retrieve a single model object.

Keep /v1 bodies clean. Do not add proprietary JSON fields to an OpenAI-compatible request or response - it breaks the contract and a strict client will reject it. When you need Zumik-specific behavior, use an Agent-* header or move to the native /v2 surface.

What carries over unchanged

  • Request and response body shapes. Same fields, same nesting, same types as OpenAI.
  • SDK usage patterns. The official OpenAI Python and TypeScript SDKs work with only the base-url and key change.
  • Bearer authentication. Authorization: Bearer <key> exactly as before. See Authentication.
  • usage reporting. Including usage.prompt_tokens_details.cached_tokens, so prompt-cache capture stays measurable with a vanilla client - see Prompt caching.
  • Streaming. stream: true works end to end, with the trailing usage chunk gated on stream_options.include_usage.
  • The OpenAI error envelope. Errors return { "error": { "message", "type", "code", "param" } }, so existing error handling and retry/backoff logic behaves identically. See Errors.

What changes

  • Base URL points to https://api.zumik.ai/v1.
  • Models can be aliases. Alongside concrete model ids you can pass a Zumik alias like code.fast or auto.balanced, which resolves to an immutable release. See Model aliases.
  • Stateful and diagnostic features live under /v2 - sessions, branches, snapshots, diagnostics, replay, and purge receipts are native /v2 objects, not /v1 extensions.

Optional Agent-* headers

Proprietary behavior travels on headers so the body stays OpenAI-exact. You never need these to get a working response; they are there when you want control or signal.

Send these on a request:

HeaderPurpose
Agent-Hints-Ref: ah_…Reference a stored Agent Hints object (QoS, reuse, routing, safety intent).
Agent-Hints: <base64url>Inline a base64url-encoded hints JSON for a one-off request.
Agent-Idempotency-Key: <key>Make a mutating request safely retryable. See Idempotency and retries.

Read these off a response:

HeaderConveys
Agent-Trace-Id: trc_…The trace id; look up the full per-request record on /v2/usage.
Agent-Execution-Profilemanaged_provider, byok, subscription, byoc_dynamo, or byoc_epp.
Agent-Resolved-ProviderThe provider that served the request, e.g. openai, anthropic.
Agent-RegionThe region the request resolved to under the project's regional policy.
Agent-QoS-Admission / Agent-QoS-Target-Met / Agent-QoS-Fallback-UsedThe compact QoS outcome.
Agent-Idempotent-Replaytrue if this response was replayed from a prior idempotent call.
Agent-Hints-HonoredComma-separated sections the platform honored, e.g. qos,reuse.

The full QoS outcome and reuse waterfall are not in the /v1 body - they live on /v2/usage, keyed by Agent-Trace-Id.

Testing checklist

Swap the base URL and key

Point the client at https://api.zumik.ai/v1 with a zk_live_… key. Confirm a basic chat.completions.create returns content and a populated usage object.

Diff the response shape

Assert your existing response parsing still passes - choices[0].message.content, finish_reason, and usage.total_tokens are all present and identically typed.

Exercise streaming

Set stream: true. Confirm chat.completion.chunk frames arrive and the stream ends with data: [DONE]. Set stream_options.include_usage: true and confirm the trailing usage chunk.

Check the Agent-* headers

Read Agent-Trace-Id, Agent-Execution-Profile, and Agent-Resolved-Provider off a response to confirm signal is flowing, then look the trace up on /v2/usage.

Confirm error parity

Trigger a 429 (a tight per-key budget) and a 400 (empty messages). Confirm the error body is OpenAI-shaped and your retry logic treats the 429 as a quota error.

Try an alias

Swap a concrete model id for an alias like code.balanced and confirm it resolves. The resolved target comes back on Agent-Resolved-Provider.

On this page