Responses

The OpenAI-compatible Responses API on Zumik. Create, retrieve, delete, and cancel responses, list input items, compact context, and count input tokens.

The Responses API is OpenAI's stateful generation surface. Zumik mirrors its shapes exactly. The input field accepts either a bare string or the structured input-item array. Proprietary signal rides on response headers, never the body.

Raw input items are retained for /input_items only when the project opts into full-fidelity retention. The default is metadata-only, so input items return an empty list. See retention and QoS.

Create a response

POST https://api.zumik.ai/v1/responses

modelstringrequired

A Zumik alias such as code.balanced, or a concrete provider model.

inputstring | arrayrequired

Either a plain string prompt, or an array of structured input items (matching OpenAI's input-item shape).

Request

curl

curl https://api.zumik.ai/v1/responses \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"code.balanced","input":"Review the latest patch."}'

Response

{
  "id": "resp_01jy7n3q8v6m4k2x...",
  "object": "response",
  "created_at": 1750000123,
  "status": "completed",
  "model": "code.balanced",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The patch looks correct; add a test for the empty case." }
      ]
    }
  ],
  "usage": {
    "input_tokens": 6,
    "output_tokens": 11,
    "total_tokens": 17
  }
}

idstring

The response id, resp_....

objectstring

Always "response".

created_atinteger

Unix timestamp (seconds).

statusstring

completed, or cancelled after a cancel.

outputarray

Output items. Each message item carries a role and a content array of parts (type: "output_text" with text).

usageobject

input_tokens, output_tokens, total_tokens.

Retrieve a response

GET https://api.zumik.ai/v1/responses/{response_id}

response_idstringpathrequired

The resp_... id.

Returns the stored response object exactly as it was created. 404 if it does not exist for this project.

curl https://api.zumik.ai/v1/responses/resp_01jy... \
  -H "Authorization: Bearer zk_live_..."

Delete a response

DELETE https://api.zumik.ai/v1/responses/{response_id}

{ "id": "resp_01jy...", "object": "response.deleted", "deleted": true }

404 if the response does not exist for this project.

Cancel a response

POST https://api.zumik.ai/v1/responses/{response_id}/cancel

Marks the response cancelled and returns the updated object. 404 if it does not exist for this project.

{ "id": "resp_01jy...", "object": "response", "status": "cancelled", "...": "..." }

List input items

GET https://api.zumik.ai/v1/responses/{response_id}/input_items

Returns the input items that produced the response, as a list.

{
  "object": "list",
  "data": [
    { "type": "input_text", "text": "Review the latest patch." }
  ],
  "has_more": false
}

When the project's retention is metadata-only (the default), inputs were not stored, so data is an empty list rather than a 404. A genuine unknown id returns 404.

Compact context

POST https://api.zumik.ai/v1/responses/compact

Fold a long prior context into a shorter, model-generated summary to stay within a context budget. The summary is produced through the broker (degrading to a deterministic summary when no gateway is configured) and persisted as a reusable compaction_summary artifact you can reference later. Because it makes a model call, this endpoint is budget-gated like inference.

inputstring | arrayrequired

The context (string or input items) to compact.

modelstring

The model to summarize with. Defaults to auto.cheapest.

{
  "object": "response.compaction",
  "id": "cmp_01jy...",
  "artifact_id": "art_01jy...",
  "model": "gemini/gemini-2.0-flash",
  "summary": { "type": "compaction_summary", "text": "The agent chose Postgres, wrote the schema, ran tests, and fixed two failures." },
  "original_input_tokens": 1840,
  "compacted_input_tokens": 460,
  "live": true
}

objectstring

Always "response.compaction".

artifact_idstring

The persisted compaction_summary artifact (art_...) holding the summary, so it can be reused or referenced from a session event.

modelstring

The resolved provider/model that produced the summary.

original_input_tokensinteger

Token count of the input before compaction.

compacted_input_tokensinteger

Token count after compaction; never greater than the original.

liveboolean

true when a model produced the summary; false on the deterministic degrade path.

For session-aware compaction — folding a branch's older turns into a checkpoint while keeping a verbatim tail, with recovery — see compacting a branch.

Count input tokens

POST https://api.zumik.ai/v1/responses/input_tokens

Estimate the token count of an input without running a generation.

inputstring | arrayrequired

The input to count.

{ "object": "response.input_tokens", "input_tokens": 6 }

Errors

HTTP	`code`	When
401	`invalid_api_key`	Missing or invalid bearer key.
402	`credits_required`	The prepaid credit balance is empty (create only).
404	(none)	The response id does not exist for this project.
429	`quota_exceeded`	Budget reached (create only).
504	`deadline_exceeded`	A QoS deadline elapsed (create only); not charged.

See the full error reference.

Responses

On this page