Zumik
Guides

Idempotency and retries

Make mutating requests safely retryable with Agent-Idempotency-Key, understand the three retry types, and keep tool side effects safe when a request is replayed.

A network blip should not double-bill a generation or fire a tool twice. Zumik gives every mutating request an idempotency key and separates three distinct kinds of retry, each with its own rules. The guiding principle is simple: a logical response can have many internal attempts, but its observable side effects happen exactly once.

Agent-Idempotency-Key

Send Agent-Idempotency-Key on a mutating request. If the same key arrives again, Zumik replays the original result instead of executing a second time.

curl https://api.zumik.ai/v1/chat/completions \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -H "Agent-Idempotency-Key: 5f3c9b2a-task-4821" \
  -d '{"model":"code.fast","messages":[{"role":"user","content":"Run the fix."}]}'

How it behaves:

  • First call runs normally and the response is cached against the key. The response carries Agent-Idempotent-Replay: false.

  • Replay of the same key returns the original response verbatim, with Agent-Idempotent-Replay: true. You are not charged again.

  • Scope is the project plus the request path plus the key, so a key is local to one endpoint in one project and cannot collide across tenants.

  • Retention of a completed result is 24 hours; an in-flight reservation lasts up to 60 seconds.

  • Concurrent duplicate - the same key arriving while the first is still in flight - returns 409 idempotency_conflict:

    {
      "error": {
        "message": "A request with this Agent-Idempotency-Key is already in progress. Retry after it completes.",
        "type": "idempotency_conflict"
      }
    }

Only successful (2xx) responses are cached. A transient 4xx/5xx releases the key so a genuine retry can proceed - you do not get stuck replaying a failure. Use a fresh, stable key per logical operation (a task id works well); reusing one key for two different operations replays the first.

The three retry types

Not every retry is the same, and treating them alike is how you double-charge or duplicate a tool call. Zumik distinguishes:

TypeWhenRule
transport_retryA transient network failure - the request may or may not have reached usSafe only with an idempotency key. Without one, you risk a duplicate generation.
provider_failoverA provider or region failed mid-requestHandled internally. The platform tries another path and records a new attempt; you do not re-send.
semantic_retryYou deliberately want a different generationA brand-new request and a new response object. Use a new idempotency key.

A transport_retry is the one you control as a client: attach the same Agent-Idempotency-Key and the retry is safe. A provider_failover happens below your call - the broker fails over (and, only under explicit policy, to the OpenRouter emergency fallback) and records the cause, provider, and timing per attempt. A semantic_retry is not a retry of the same operation at all; it is a new operation that gets its own response.

Do not auto-replay a generation after observable streamed output has already reached the user unless the path supports resumability. On a stream disconnect, retrieve the response status or resume from an event cursor rather than re-issuing the generation. See Streaming.

Tool-side-effect safety

The hard part of retries is tools with side effects - creating a record, sending a message, charging a card. A retried or failed-over request must never run those twice. The rules:

Give every tool call a stable call id

A retry reuses the same call id, so a downstream system can dedupe on it rather than acting twice.

Commit assistant output to history exactly once

A logical response may have several internal attempts, but it is written to the session event line only once - the session never records a generation twice.

Require external commit for side-effecting tools

Express it in Agent Hints with safety.tool_side_effect_mode: "external_commit_required" so an effect is committed by your system after confirmation, not implicitly during a retried attempt.

Declare retry safety

Set safety.retry_safety: "idempotent_generation_only" so the platform only ever retries a generation it can prove is safe to repeat.

A logical response is the unit of meaning; attempts are the unit of execution. Keep side effects bound to the logical response and a retry stays invisible to everything downstream.

On this page