Idempotency and retries
Make mutating requests safely retryable with Agent-Idempotency-Key, understand the three retry types, and keep tool side effects safe when a request is replayed.
A network blip should not double-bill a generation or fire a tool twice. Zumik gives every mutating request an idempotency key and separates three distinct kinds of retry, each with its own rules. The guiding principle is simple: a logical response can have many internal attempts, but its observable side effects happen exactly once.
Agent-Idempotency-Key
Send Agent-Idempotency-Key on a mutating request. If the same key arrives again, Zumik replays the
original result instead of executing a second time.
curl https://api.zumik.ai/v1/chat/completions \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-H "Agent-Idempotency-Key: 5f3c9b2a-task-4821" \
-d '{"model":"code.fast","messages":[{"role":"user","content":"Run the fix."}]}'How it behaves:
-
First call runs normally and the response is cached against the key. The response carries
Agent-Idempotent-Replay: false. -
Replay of the same key returns the original response verbatim, with
Agent-Idempotent-Replay: true. You are not charged again. -
Scope is the project plus the request path plus the key, so a key is local to one endpoint in one project and cannot collide across tenants.
-
Retention of a completed result is 24 hours; an in-flight reservation lasts up to 60 seconds.
-
Concurrent duplicate - the same key arriving while the first is still in flight - returns
409 idempotency_conflict:{ "error": { "message": "A request with this Agent-Idempotency-Key is already in progress. Retry after it completes.", "type": "idempotency_conflict" } }
Only successful (2xx) responses are cached. A transient 4xx/5xx releases the key so a genuine
retry can proceed - you do not get stuck replaying a failure. Use a fresh, stable key per logical
operation (a task id works well); reusing one key for two different operations replays the first.
The three retry types
Not every retry is the same, and treating them alike is how you double-charge or duplicate a tool call. Zumik distinguishes:
| Type | When | Rule |
|---|---|---|
transport_retry | A transient network failure - the request may or may not have reached us | Safe only with an idempotency key. Without one, you risk a duplicate generation. |
provider_failover | A provider or region failed mid-request | Handled internally. The platform tries another path and records a new attempt; you do not re-send. |
semantic_retry | You deliberately want a different generation | A brand-new request and a new response object. Use a new idempotency key. |
A transport_retry is the one you control as a client: attach the same Agent-Idempotency-Key and the
retry is safe. A provider_failover happens below your call - the broker fails over (and, only under
explicit policy, to the OpenRouter emergency fallback) and records
the cause, provider, and timing per attempt. A semantic_retry is not a retry of the same operation at
all; it is a new operation that gets its own response.
Do not auto-replay a generation after observable streamed output has already reached the user unless the path supports resumability. On a stream disconnect, retrieve the response status or resume from an event cursor rather than re-issuing the generation. See Streaming.
Tool-side-effect safety
The hard part of retries is tools with side effects - creating a record, sending a message, charging a card. A retried or failed-over request must never run those twice. The rules:
Give every tool call a stable call id
A retry reuses the same call id, so a downstream system can dedupe on it rather than acting twice.
Commit assistant output to history exactly once
A logical response may have several internal attempts, but it is written to the session event line only once - the session never records a generation twice.
Require external commit for side-effecting tools
Express it in Agent Hints with
safety.tool_side_effect_mode: "external_commit_required" so an effect is committed by your system
after confirmation, not implicitly during a retried attempt.
Declare retry safety
Set safety.retry_safety: "idempotent_generation_only" so the platform only ever retries a
generation it can prove is safe to repeat.
A logical response is the unit of meaning; attempts are the unit of execution. Keep side effects bound to the logical response and a retry stays invisible to everything downstream.
Streaming
Server-sent events on /v1/chat/completions and /v1/responses, the chunk shape, stream_options.include_usage, and why Zumik buffers the stream to deliver real content with correct billing.
Workload diagnostics
Capture metadata traces, run the Agent Workload Efficiency Diagnostic, read the Workload Reuse Score and reuse waterfall, and get a recommended execution profile - before you change any infrastructure.