Zumik
Guides

QoS outcomes

Declare a QoS class and targets, read the formal qos_outcome with its admission, completion, and reason codes, and carry the same intent on OpenAI-compatible traffic through response headers and Agent Hints.

Quality of Service in Zumik is a contract: you declare what the request needs - a class, a first-token target, a hard deadline - and every response tells you, formally, what happened. The outcome is first-class JSON on /v2 and rides on headers on /v1, so a stock OpenAI client never has to guess.

Declare QoS

On /v2/responses, attach a qos block:

FieldTypeNotes
classenuminteractive, standard (default), background, batch. Sets admission priority and scheduling intent.
target_ttft_msintegerSoft first-token target. Reported as met or missed; not enforced as an abort.
deadline_msintegerHard bound. If the provider runs past it, the request aborts with 504 deadline_exceeded and is not charged.
priorityintegerAdmission priority within the class.
degrade_policyenumforbid, or allow_compatible_fallback to permit a compatible degraded path under load.
curl https://api.zumik.ai/v2/responses \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "code.fast",
    "input": "Summarize the failing test.",
    "qos": { "class": "interactive", "target_ttft_ms": 500, "deadline_ms": 5000, "degrade_policy": "allow_compatible_fallback" }
  }'

Read the outcome

/v2 returns a formal qos_outcome in the body:

{
  "execution_profile": "managed_provider",
  "qos_outcome": {
    "admission": "admitted",
    "completion": "completed",
    "target_met": true,
    "ttft_ms": 382,
    "latency_ms": 2710,
    "deadline_met": true,
    "degraded": false,
    "fallback_used": false,
    "reason_code": null
  }
}
FieldValues
admissionadmitted, queued, rejected, expired_before_start.
completioncompleted, failed, cancelled, expired_during_execution.
target_metWhether the target_ttft_ms target was met (null when none was set).
deadline_metWhether the deadline_ms bound held (null when none was set).
degradedTrue if the request ran on a degraded path.
fallback_usedTrue if a compatible fallback profile served it.
reason_codeWhy a target was missed or the request degraded (below).

Reason codes

When something falls short, reason_code says why - so you can fix the cause, not guess:

CodeMeaning
queue_saturationAdmission queue was full; the request waited or was rejected.
provider_rate_limitThe provider throttled the call.
provider_timeoutThe provider did not respond in time.
region_unavailableNo allowed region could serve it.
alias_no_compatible_targetNo target in the alias release matched the constraints.
cache_missExpected reuse did not materialize.
cache_transfer_slower_than_recomputeLoading the cached KV would have been slower than recomputing.
fallback_profile_usedA compatible fallback profile served the request.
customer_deadline_too_shortThe deadline_ms was too tight for any path to meet.

On OpenAI-compatible traffic

/v1 keeps the body OpenAI-shaped, so the same signal rides on response headers:

HeaderMaps to
Agent-QoS-Admissionadmitted / queued / rejected / expired_before_start.
Agent-QoS-Target-Mettrue/false when a target was set.
Agent-QoS-Fallback-Usedtrue/false - whether a compatible fallback profile served it.
Agent-Trace-Idtrc_… - look up the full outcome on /v2/usage.
Agent-Execution-Profilemanaged_provider / byok / subscription.

The full per-request outcome (every field above) is recorded as a usage event, so a /v1 client gets header-level signal in the moment and the complete record on /v2/usage by trace id.

Agent Hints

To carry QoS - and reuse, routing, and safety intent - on /v1 without leaving the OpenAI body shape, use the vendor-neutral Agent Hints contract. It expresses intent without exposing provider- or engine-specific knobs.

Store a hints object once and reference it by id:

curl https://api.zumik.ai/v2/agent-hints \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "version": "2026-06-01",
    "session": { "session_id": "ses_…", "branch_id": "br_…", "expected_branch_version": 17 },
    "reuse": { "preference": "prefer", "scope": "project" },
    "qos": { "class": "interactive", "target_ttft_ms": 500, "deadline_ms": 5000, "degrade_policy": "allow_compatible_fallback" },
    "routing": { "region_policy": "us_only", "execution_profiles": ["managed_provider", "byok"], "data_boundary": "project" }
  }'
# => { "id": "ah_…", "object": "agent_hints", ... }

Then attach it to any /v1 request two ways:

  • Agent-Hints-Ref: ah_… - reference the stored object.
  • Agent-Hints: <base64url> - inline a base64url-encoded hints JSON for a one-off request.

Zumik reports which sections it honored on Agent-Hints-Honored (e.g. qos,reuse). The contract is versioned (version), so a hint your runtime doesn't understand is ignored rather than rejected.

On this page