OpenAI migration guide

Point an existing OpenAI client at Zumik with no body changes - the full /v1 compatibility surface, the base-url swap, what carries over unchanged, the optional Agent-* headers, and a testing checklist.

Zumik's /v1 is a strict OpenAI-compatible surface. Swap the base URL and your existing client keeps working: the request and response JSON stays exactly OpenAI-shaped. Everything proprietary - reuse signals, execution profile, QoS outcome, idempotent-replay markers - rides on optional Agent-* headers, never in the body. So a stock OpenAI SDK reads a Zumik response the same way it reads an OpenAI one, and the extra signal is there for clients that look for it.

The base-url swap

Set the base URL to https://api.zumik.ai/v1 and use a Zumik API key as the bearer token. No other code changes are required.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.zumik.ai/v1",
    api_key="zk_live_...",  # a Zumik key, not an OpenAI key
)

r = client.chat.completions.create(
    model="gpt-4o",  # or a Zumik alias like code.balanced
    messages=[{"role": "user", "content": "Summarize the failing test."}],
)
print(r.choices[0].message.content)
print(r.usage.prompt_tokens_details.cached_tokens)  # reuse stays visible

The API key is a Zumik key (zk_live_…), not your provider key. To call providers with your own credential instead, attach a BYOK key - the base url and body stay the same.

The compatibility surface

These /v1 endpoints are live today and accept the exact OpenAI request shapes:

Endpoint	Notes
`POST /v1/chat/completions`	Chat Completions, including `stream` and `stream_options.include_usage`. See Streaming.
`POST /v1/responses`	Responses API. `input` accepts a bare string or the structured input-item array.
`GET /v1/responses/{id}`	Retrieve a stored response.
`POST /v1/responses/{id}/cancel`	Cancel an in-flight response.
`POST /v1/responses/input_tokens`	Count input tokens without generating.
`POST /v1/embeddings`	Embeddings, OpenAI request and response shape.
`POST /v1/batches`	Batch jobs over an uploaded input file. See the Files API.
`GET /v1/models`	Returns the minimal OpenAI model-object shape (`id`, `object`, `created`, `owned_by`).
`GET /v1/models/{id}`	Retrieve a single model object.

Keep /v1 bodies clean. Do not add proprietary JSON fields to an OpenAI-compatible request or response - it breaks the contract and a strict client will reject it. When you need Zumik-specific behavior, use an Agent-* header or move to the native /v2 surface.

What carries over unchanged

Request and response body shapes. Same fields, same nesting, same types as OpenAI.
SDK usage patterns. The official OpenAI Python and TypeScript SDKs work with only the base-url and key change.
Bearer authentication. Authorization: Bearer <key> exactly as before. See Authentication.
usage reporting. Including usage.prompt_tokens_details.cached_tokens, so prompt-cache capture stays measurable with a vanilla client - see Prompt caching.
Streaming. stream: true works end to end, with the trailing usage chunk gated on stream_options.include_usage.
The OpenAI error envelope. Errors return { "error": { "message", "type", "code", "param" } }, so existing error handling and retry/backoff logic behaves identically. See Errors.

What changes

Base URL points to https://api.zumik.ai/v1.
Models can be aliases. Alongside concrete model ids you can pass a Zumik alias like code.fast or auto.balanced, which resolves to an immutable release. See Model aliases.
Stateful and diagnostic features live under /v2 - sessions, branches, snapshots, diagnostics, replay, and purge receipts are native /v2 objects, not /v1 extensions.

Optional Agent-* headers

Proprietary behavior travels on headers so the body stays OpenAI-exact. You never need these to get a working response; they are there when you want control or signal.

Send these on a request:

Header	Purpose
`Agent-Hints-Ref: ah_…`	Reference a stored Agent Hints object (QoS, reuse, routing, safety intent).
`Agent-Hints: <base64url>`	Inline a base64url-encoded hints JSON for a one-off request.
`Agent-Idempotency-Key: <key>`	Make a mutating request safely retryable. See Idempotency and retries.

Read these off a response:

Header	Conveys
`Agent-Trace-Id: trc_…`	The trace id; look up the full per-request record on `/v2/usage`.
`Agent-Execution-Profile`	`managed_provider`, `byok`, `subscription`, `byoc_dynamo`, or `byoc_epp`.
`Agent-Resolved-Provider`	The provider that served the request, e.g. `openai`, `anthropic`.
`Agent-Region`	The region the request resolved to under the project's regional policy.
`Agent-QoS-Admission` / `Agent-QoS-Target-Met` / `Agent-QoS-Fallback-Used`	The compact QoS outcome.
`Agent-Idempotent-Replay`	`true` if this response was replayed from a prior idempotent call.
`Agent-Hints-Honored`	Comma-separated sections the platform honored, e.g. `qos,reuse`.

The full QoS outcome and reuse waterfall are not in the /v1 body - they live on /v2/usage, keyed by Agent-Trace-Id.

Testing checklist

Swap the base URL and key

Point the client at https://api.zumik.ai/v1 with a zk_live_… key. Confirm a basic chat.completions.create returns content and a populated usage object.

Diff the response shape

Assert your existing response parsing still passes - choices[0].message.content, finish_reason, and usage.total_tokens are all present and identically typed.

Exercise streaming

Set stream: true. Confirm chat.completion.chunk frames arrive and the stream ends with data: [DONE]. Set stream_options.include_usage: true and confirm the trailing usage chunk.

Check the Agent-* headers

Read Agent-Trace-Id, Agent-Execution-Profile, and Agent-Resolved-Provider off a response to confirm signal is flowing, then look the trace up on /v2/usage.

Confirm error parity

Trigger a 429 (a tight per-key budget) and a 400 (empty messages). Confirm the error body is OpenAI-shaped and your retry logic treats the 429 as a quota error.

Try an alias

Swap a concrete model id for an alias like code.balanced and confirm it resolves. The resolved target comes back on Agent-Resolved-Provider.

OpenAI compatibility reference

The endpoint-by-endpoint compatibility contract.

Streaming

SSE chunk shape, the usage chunk, and the buffered-streaming note.