OpenAI migration guide
Point an existing OpenAI client at Zumik with no body changes - the full /v1 compatibility surface, the base-url swap, what carries over unchanged, the optional Agent-* headers, and a testing checklist.
Zumik's /v1 is a strict OpenAI-compatible surface. Swap the base URL and your existing client keeps
working: the request and response JSON stays exactly OpenAI-shaped. Everything proprietary - reuse
signals, execution profile, QoS outcome, idempotent-replay markers - rides on optional Agent-*
headers, never in the body. So a stock OpenAI SDK reads a Zumik response the same way it reads an
OpenAI one, and the extra signal is there for clients that look for it.
The base-url swap
Set the base URL to https://api.zumik.ai/v1 and use a Zumik API key as the bearer token. No other
code changes are required.
from openai import OpenAI
client = OpenAI(
base_url="https://api.zumik.ai/v1",
api_key="zk_live_...", # a Zumik key, not an OpenAI key
)
r = client.chat.completions.create(
model="gpt-4o", # or a Zumik alias like code.balanced
messages=[{"role": "user", "content": "Summarize the failing test."}],
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens_details.cached_tokens) # reuse stays visibleThe API key is a Zumik key (zk_live_…), not your provider key. To call providers with your own
credential instead, attach a BYOK key - the base url and body stay the same.
The compatibility surface
These /v1 endpoints are live today and accept the exact OpenAI request shapes:
| Endpoint | Notes |
|---|---|
POST /v1/chat/completions | Chat Completions, including stream and stream_options.include_usage. See Streaming. |
POST /v1/responses | Responses API. input accepts a bare string or the structured input-item array. |
GET /v1/responses/{id} | Retrieve a stored response. |
POST /v1/responses/{id}/cancel | Cancel an in-flight response. |
POST /v1/responses/input_tokens | Count input tokens without generating. |
POST /v1/embeddings | Embeddings, OpenAI request and response shape. |
POST /v1/batches | Batch jobs over an uploaded input file. See the Files API. |
GET /v1/models | Returns the minimal OpenAI model-object shape (id, object, created, owned_by). |
GET /v1/models/{id} | Retrieve a single model object. |
Keep /v1 bodies clean. Do not add proprietary JSON fields to an OpenAI-compatible request or
response - it breaks the contract and a strict client will reject it. When you need Zumik-specific
behavior, use an Agent-* header or move to the native /v2 surface.
What carries over unchanged
- Request and response body shapes. Same fields, same nesting, same types as OpenAI.
- SDK usage patterns. The official OpenAI Python and TypeScript SDKs work with only the base-url and key change.
- Bearer authentication.
Authorization: Bearer <key>exactly as before. See Authentication. usagereporting. Includingusage.prompt_tokens_details.cached_tokens, so prompt-cache capture stays measurable with a vanilla client - see Prompt caching.- Streaming.
stream: trueworks end to end, with the trailing usage chunk gated onstream_options.include_usage. - The OpenAI error envelope. Errors return
{ "error": { "message", "type", "code", "param" } }, so existing error handling and retry/backoff logic behaves identically. See Errors.
What changes
- Base URL points to
https://api.zumik.ai/v1. - Models can be aliases. Alongside concrete model ids you can pass a Zumik alias like
code.fastorauto.balanced, which resolves to an immutable release. See Model aliases. - Stateful and diagnostic features live under
/v2- sessions, branches, snapshots, diagnostics, replay, and purge receipts are native/v2objects, not/v1extensions.
Optional Agent-* headers
Proprietary behavior travels on headers so the body stays OpenAI-exact. You never need these to get a working response; they are there when you want control or signal.
Send these on a request:
| Header | Purpose |
|---|---|
Agent-Hints-Ref: ah_… | Reference a stored Agent Hints object (QoS, reuse, routing, safety intent). |
Agent-Hints: <base64url> | Inline a base64url-encoded hints JSON for a one-off request. |
Agent-Idempotency-Key: <key> | Make a mutating request safely retryable. See Idempotency and retries. |
Read these off a response:
| Header | Conveys |
|---|---|
Agent-Trace-Id: trc_… | The trace id; look up the full per-request record on /v2/usage. |
Agent-Execution-Profile | managed_provider, byok, subscription, byoc_dynamo, or byoc_epp. |
Agent-Resolved-Provider | The provider that served the request, e.g. openai, anthropic. |
Agent-Region | The region the request resolved to under the project's regional policy. |
Agent-QoS-Admission / Agent-QoS-Target-Met / Agent-QoS-Fallback-Used | The compact QoS outcome. |
Agent-Idempotent-Replay | true if this response was replayed from a prior idempotent call. |
Agent-Hints-Honored | Comma-separated sections the platform honored, e.g. qos,reuse. |
The full QoS outcome and reuse waterfall are not in the /v1 body - they live on
/v2/usage, keyed by Agent-Trace-Id.
Testing checklist
Swap the base URL and key
Point the client at https://api.zumik.ai/v1 with a zk_live_… key. Confirm a basic
chat.completions.create returns content and a populated usage object.
Diff the response shape
Assert your existing response parsing still passes - choices[0].message.content, finish_reason,
and usage.total_tokens are all present and identically typed.
Exercise streaming
Set stream: true. Confirm chat.completion.chunk frames arrive and the stream ends with
data: [DONE]. Set stream_options.include_usage: true and confirm the trailing usage chunk.
Check the Agent-* headers
Read Agent-Trace-Id, Agent-Execution-Profile, and Agent-Resolved-Provider off a response to
confirm signal is flowing, then look the trace up on /v2/usage.
Confirm error parity
Trigger a 429 (a tight per-key budget) and a 400 (empty messages). Confirm the error body is
OpenAI-shaped and your retry logic treats the 429 as a quota error.
Try an alias
Swap a concrete model id for an alias like code.balanced and confirm it resolves. The resolved
target comes back on Agent-Resolved-Provider.
Retention and purge
Delete revokes access; purge removes retained representations and returns a signed receipt. Guarantee classes, purge generations, and how resurrection is prevented.
Prompt caching
Capture provider-native prompt caching through Zumik across OpenAI, Anthropic, Gemini, xAI, and Fireworks - and measure how much you actually reused.