Responses
The OpenAI-compatible Responses API on Zumik. Create, retrieve, delete, and cancel responses, list input items, compact context, and count input tokens.
The Responses API is OpenAI's stateful generation surface. Zumik mirrors its shapes exactly. The input field accepts either a bare string or the structured input-item array. Proprietary signal rides on response headers, never the body.
Raw input items are retained for /input_items only when the project opts into full-fidelity retention. The default is metadata-only, so input items return an empty list. See retention and QoS.
Create a response
POST https://api.zumik.ai/v1/responsesmodelstringrequiredA Zumik alias such as code.balanced, or a concrete provider model.
inputstring | arrayrequiredEither a plain string prompt, or an array of structured input items (matching OpenAI's input-item shape).
Request
curl https://api.zumik.ai/v1/responses \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-d '{"model":"code.balanced","input":"Review the latest patch."}'Response
{
"id": "resp_01jy7n3q8v6m4k2x...",
"object": "response",
"created_at": 1750000123,
"status": "completed",
"model": "code.balanced",
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{ "type": "output_text", "text": "The patch looks correct; add a test for the empty case." }
]
}
],
"usage": {
"input_tokens": 6,
"output_tokens": 11,
"total_tokens": 17
}
}idstringThe response id, resp_....
objectstringAlways "response".
created_atintegerUnix timestamp (seconds).
statusstringcompleted, or cancelled after a cancel.
outputarrayOutput items. Each message item carries a role and a content array of parts (type: "output_text" with text).
usageobjectinput_tokens, output_tokens, total_tokens.
Retrieve a response
GET https://api.zumik.ai/v1/responses/{response_id}response_idstringpathrequiredThe resp_... id.
Returns the stored response object exactly as it was created. 404 if it does not exist for this project.
curl https://api.zumik.ai/v1/responses/resp_01jy... \
-H "Authorization: Bearer zk_live_..."Delete a response
DELETE https://api.zumik.ai/v1/responses/{response_id}{ "id": "resp_01jy...", "object": "response.deleted", "deleted": true }404 if the response does not exist for this project.
Cancel a response
POST https://api.zumik.ai/v1/responses/{response_id}/cancelMarks the response cancelled and returns the updated object. 404 if it does not exist for this project.
{ "id": "resp_01jy...", "object": "response", "status": "cancelled", "...": "..." }List input items
GET https://api.zumik.ai/v1/responses/{response_id}/input_itemsReturns the input items that produced the response, as a list.
{
"object": "list",
"data": [
{ "type": "input_text", "text": "Review the latest patch." }
],
"has_more": false
}When the project's retention is metadata-only (the default), inputs were not stored, so data is an empty list rather than a 404. A genuine unknown id returns 404.
Compact context
POST https://api.zumik.ai/v1/responses/compactFold a long prior context into a shorter, model-generated summary to stay within a context
budget. The summary is produced through the broker (degrading to a deterministic summary when no
gateway is configured) and persisted as a reusable compaction_summary artifact
you can reference later. Because it makes a model call, this endpoint is budget-gated like inference.
inputstring | arrayrequiredThe context (string or input items) to compact.
modelstringThe model to summarize with. Defaults to auto.cheapest.
{
"object": "response.compaction",
"id": "cmp_01jy...",
"artifact_id": "art_01jy...",
"model": "gemini/gemini-2.0-flash",
"summary": { "type": "compaction_summary", "text": "The agent chose Postgres, wrote the schema, ran tests, and fixed two failures." },
"original_input_tokens": 1840,
"compacted_input_tokens": 460,
"live": true
}objectstringAlways "response.compaction".
artifact_idstringThe persisted compaction_summary artifact (art_...) holding the summary, so it can be reused or referenced from a session event.
modelstringThe resolved provider/model that produced the summary.
original_input_tokensintegerToken count of the input before compaction.
compacted_input_tokensintegerToken count after compaction; never greater than the original.
livebooleantrue when a model produced the summary; false on the deterministic degrade path.
For session-aware compaction — folding a branch's older turns into a checkpoint while keeping a verbatim tail, with recovery — see compacting a branch.
Count input tokens
POST https://api.zumik.ai/v1/responses/input_tokensEstimate the token count of an input without running a generation.
inputstring | arrayrequiredThe input to count.
{ "object": "response.input_tokens", "input_tokens": 6 }Errors
| HTTP | code | When |
|---|---|---|
| 401 | invalid_api_key | Missing or invalid bearer key. |
| 402 | credits_required | The prepaid credit balance is empty (create only). |
| 404 | (none) | The response id does not exist for this project. |
| 429 | quota_exceeded | Budget reached (create only). |
| 504 | deadline_exceeded | A QoS deadline elapsed (create only); not charged. |
See the full error reference.
Pagination
How Zumik list endpoints return results. Lists are returned in full as a single page; the list envelope carries has_more for forward compatibility.
Chat completions
POST /v1/chat/completions. The exact OpenAI chat-completions shape, including streaming over Server-Sent Events and stream_options.include_usage.