Responses
Create, retrieve, and cancel native responses. The /v2 surface pins session state and an explicit QoS request in the body, and reports the formal QoS outcome.
The native /v2/responses surface pins session state and a QoS request directly in the request body, rather than squeezing them through compatibility headers as /v1 does. The response carries the chosen execution profile and the formal QoS outcome inline. Response ids are prefixed rsp_. See QoS.
All requests require a bearer API key. Creating a response runs inference, so the project must have a positive prepaid credit balance and be under its budget. See authentication and billing and budgets.
Create a response
POST /v2/responses
modelstringrequiredThe model alias or concrete target to run, e.g. code.fast. Resolved through an immutable alias release.
inputstringrequiredThe input text. Must not be empty.
session_idstringOptional ses_... to associate the response with a session. Must exist in this project.
branch_idstringOptional br_... branch within the session. Must resolve through a session this project owns.
qosobjectOptional explicit QoS request. Defaults to the standard class with no hard targets and compatible fallback allowed.
qos
classstringrequiredOne of interactive, standard, background, batch.
target_ttft_msintegerTime-to-first-token target. The outcome reports target_met against it.
deadline_msintegerHard deadline. If the provider does not respond within this window the request aborts with deadline_exceeded (504) and is never charged.
priorityintegerScheduling priority hint, 0-255.
degrade_policystringrequiredOne of forbid, allow_compatible_fallback.
curl https://api.zumik.ai/v2/responses \
-H "Authorization: Bearer $ZUMIK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "code.fast",
"input": "Summarize the diff in three bullets.",
"session_id": "ses_01jy7n7w2z9pcu7a5q2gh8askm",
"branch_id": "br_01jy7n7w30ardv8b6r3jk9btln",
"qos": { "class": "interactive", "target_ttft_ms": 500, "deadline_ms": 8000, "degrade_policy": "allow_compatible_fallback" }
}'from zumik import Zumik
zk = Zumik()
resp = zk.responses.create(
model="code.fast",
input="Summarize the diff in three bullets.",
session_id=session.id,
branch_id=session.default_branch_id,
qos={"class": "interactive", "target_ttft_ms": 500, "deadline_ms": 8000},
){
"id": "rsp_01jy7ndf89g0h1j2k3l4m5n6op",
"object": "response",
"session_id": "ses_01jy7n7w2z9pcu7a5q2gh8askm",
"branch_id": "br_01jy7n7w30ardv8b6r3jk9btln",
"status": "completed",
"model": "code.fast",
"execution_profile": "managed_provider",
"output_text": "- Adds a CAS guard to branch append\n- Cleans up the dead retry shim\n- Bumps the test fixtures",
"qos_outcome": {
"admission": "admitted",
"completion": "completed",
"target_met": true,
"ttft_ms": 312,
"latency_ms": 1840,
"deadline_met": true,
"degraded": false,
"fallback_used": false,
"reason_code": null
}
}idstringOpaque response id, prefixed rsp_.
objectstringAlways response.
session_idstringThe associated session, or null.
branch_idstringThe associated branch, or null.
statusstringcompleted on success; cancelled after a cancel.
modelstringThe requested model, echoed back.
execution_profilestringWhich profile served the request: managed_provider, byok, or subscription.
output_textstringThe generated text.
qos_outcomeobjectThe formal QoS outcome.
qos_outcome
admissionstringOne of admitted, queued, rejected, expired_before_start.
completionstringOne of completed, failed, cancelled, expired_during_execution.
target_metbooleanWhether target_ttft_ms was met, or null when no target was set.
ttft_msintegerObserved time to first token.
latency_msintegerObserved total latency.
deadline_metbooleanWhether deadline_ms was met, or null when none was set.
degradedbooleanWhether the request was served in a degraded mode.
fallback_usedbooleanWhether a fallback profile served the request.
reason_codestringA stable reason code when a target was missed, otherwise null. One of queue_saturation, provider_rate_limit, provider_timeout, region_unavailable, alias_no_compatible_target, cache_miss, cache_transfer_slower_than_recompute, fallback_profile_used, customer_deadline_too_short.
Retrieve a response
GET /v2/responses/{response_id}
response_idstringpathrequiredThe rsp_... id to fetch.
curl https://api.zumik.ai/v2/responses/rsp_01jy7ndf89g0h1j2k3l4m5n6op \
-H "Authorization: Bearer $ZUMIK_API_KEY"Returns the stored response object.
Cancel a response
POST /v2/responses/{response_id}/cancel
Marks a stored response cancelled. Returns the updated object.
response_idstringpathrequiredThe rsp_... id to cancel.
curl -X POST https://api.zumik.ai/v2/responses/rsp_01jy7ndf89g0h1j2k3l4m5n6op/cancel \
-H "Authorization: Bearer $ZUMIK_API_KEY"Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_request_error | input is empty, or session_id / branch_id does not exist in this project. |
| 401 | invalid_api_key | Missing or invalid API key. |
| 402 | credits_required | The prepaid credit balance is empty. |
| 403 | region_not_allowed | The resolved region is forbidden by this project's regional policy. |
| 404 | invalid_request_error | The response does not exist in this project. |
| 429 | quota_exceeded / rate_limit_exceeded | Budget reached, or per-key rate limit exceeded. |
| 504 | deadline_exceeded | The QoS deadline_ms elapsed before the provider responded. The request is not charged. |
See the full table on errors.