QoS outcomes
Declare a QoS class and targets, read the formal qos_outcome with its admission, completion, and reason codes, and carry the same intent on OpenAI-compatible traffic through response headers and Agent Hints.
Quality of Service in Zumik is a contract: you declare what the request needs - a class, a first-token
target, a hard deadline - and every response tells you, formally, what happened. The outcome is
first-class JSON on /v2 and rides on headers on /v1, so a stock OpenAI client never has to guess.
Declare QoS
On /v2/responses, attach a qos block:
| Field | Type | Notes |
|---|---|---|
class | enum | interactive, standard (default), background, batch. Sets admission priority and scheduling intent. |
target_ttft_ms | integer | Soft first-token target. Reported as met or missed; not enforced as an abort. |
deadline_ms | integer | Hard bound. If the provider runs past it, the request aborts with 504 deadline_exceeded and is not charged. |
priority | integer | Admission priority within the class. |
degrade_policy | enum | forbid, or allow_compatible_fallback to permit a compatible degraded path under load. |
curl https://api.zumik.ai/v2/responses \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "code.fast",
"input": "Summarize the failing test.",
"qos": { "class": "interactive", "target_ttft_ms": 500, "deadline_ms": 5000, "degrade_policy": "allow_compatible_fallback" }
}'Read the outcome
/v2 returns a formal qos_outcome in the body:
{
"execution_profile": "managed_provider",
"qos_outcome": {
"admission": "admitted",
"completion": "completed",
"target_met": true,
"ttft_ms": 382,
"latency_ms": 2710,
"deadline_met": true,
"degraded": false,
"fallback_used": false,
"reason_code": null
}
}| Field | Values |
|---|---|
admission | admitted, queued, rejected, expired_before_start. |
completion | completed, failed, cancelled, expired_during_execution. |
target_met | Whether the target_ttft_ms target was met (null when none was set). |
deadline_met | Whether the deadline_ms bound held (null when none was set). |
degraded | True if the request ran on a degraded path. |
fallback_used | True if a compatible fallback profile served it. |
reason_code | Why a target was missed or the request degraded (below). |
Reason codes
When something falls short, reason_code says why - so you can fix the cause, not guess:
| Code | Meaning |
|---|---|
queue_saturation | Admission queue was full; the request waited or was rejected. |
provider_rate_limit | The provider throttled the call. |
provider_timeout | The provider did not respond in time. |
region_unavailable | No allowed region could serve it. |
alias_no_compatible_target | No target in the alias release matched the constraints. |
cache_miss | Expected reuse did not materialize. |
cache_transfer_slower_than_recompute | Loading the cached KV would have been slower than recomputing. |
fallback_profile_used | A compatible fallback profile served the request. |
customer_deadline_too_short | The deadline_ms was too tight for any path to meet. |
On OpenAI-compatible traffic
/v1 keeps the body OpenAI-shaped, so the same signal rides on response headers:
| Header | Maps to |
|---|---|
Agent-QoS-Admission | admitted / queued / rejected / expired_before_start. |
Agent-QoS-Target-Met | true/false when a target was set. |
Agent-QoS-Fallback-Used | true/false - whether a compatible fallback profile served it. |
Agent-Trace-Id | trc_… - look up the full outcome on /v2/usage. |
Agent-Execution-Profile | managed_provider / byok / subscription. |
The full per-request outcome (every field above) is recorded as a usage event, so a /v1 client gets
header-level signal in the moment and the complete record on /v2/usage by trace id.
Agent Hints
To carry QoS - and reuse, routing, and safety intent - on /v1 without leaving the OpenAI body shape,
use the vendor-neutral Agent Hints contract. It expresses intent without exposing provider- or
engine-specific knobs.
Store a hints object once and reference it by id:
curl https://api.zumik.ai/v2/agent-hints \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-d '{
"version": "2026-06-01",
"session": { "session_id": "ses_…", "branch_id": "br_…", "expected_branch_version": 17 },
"reuse": { "preference": "prefer", "scope": "project" },
"qos": { "class": "interactive", "target_ttft_ms": 500, "deadline_ms": 5000, "degrade_policy": "allow_compatible_fallback" },
"routing": { "region_policy": "us_only", "execution_profiles": ["managed_provider", "byok"], "data_boundary": "project" }
}'
# => { "id": "ah_…", "object": "agent_hints", ... }Then attach it to any /v1 request two ways:
Agent-Hints-Ref: ah_…- reference the stored object.Agent-Hints: <base64url>- inline abase64url-encoded hints JSON for a one-off request.
Zumik reports which sections it honored on Agent-Hints-Honored (e.g. qos,reuse). The contract is
versioned (version), so a hint your runtime doesn't understand is ignored rather than rejected.
Subscription credentials
Attach a Claude Code or ChatGPT Codex subscription so eligible traffic runs against its bundled allowance at cache-discounted rates.
Billing and budgets
The pricing tiers, prepaid credits, hard monthly caps, 50/80/100% alerts, per-API-key budgets, and opt-in overage - plus how to read your account and what each error means.