Agent runtime
A multi-turn agent loop on Zumik's native /v2 session and branch surface - an append-only transcript with optimistic concurrency and interactive QoS.
A multi-turn agent loop on Zumik's native, stateful /v2 surface. The branch is the agent's append-only transcript; the session carries any pinned base context. Each turn appends events with optimistic concurrency and runs a code.fast response under an interactive QoS request.
Source: examples/agent-runtime. It is fully native - one surface, no OpenAI client.
Run
export ZUMIK_API_KEY="zk_..."
pip install -r requirements.txt # httpx==0.28.1
python runtime.pyThe loop
Open a session and use its default branch
The session returns a default_branch_id. The runtime tracks the branch's optimistic-concurrency position in a small cursor (version and head event id).
r = client.post("/v2/sessions", headers=h, json={})
s = r.json()
cur = BranchCursor(session_id=s["id"], branch_id=s["default_branch_id"], version=0)Append the user turn
Each turn first records a user_message event. The append carries expected_version and expected_head_event_id, so a stale writer is rejected rather than silently rewriting history.
r = client.post(
f"/v2/sessions/{cur.session_id}/branches/{cur.branch_id}/events",
headers=h,
json={
"expected_version": cur.version,
"expected_head_event_id": cur.head_event_id,
"event": {"event_type": "user_message"},
},
)
if r.status_code == 409:
raise RuntimeError(f"branch version conflict: {r.json()}")
evt = r.json()
cur.version, cur.head_event_id = evt["sequence"], evt["id"]Run the response against the session and branch
Generation goes through /v2/responses with an interactive QoS request: a low TTFT target, a hard deadline, and a compatible fallback so a saturated primary degrades instead of failing the turn.
INTERACTIVE_QOS = {
"class": "interactive",
"target_ttft_ms": 500,
"deadline_ms": 8000,
"degrade_policy": "allow_compatible_fallback",
}
r = client.post("/v2/responses", headers=h, json={
"model": "code.fast",
"input": text,
"session_id": cur.session_id,
"branch_id": cur.branch_id,
"qos": INTERACTIVE_QOS,
})
reply = r.json()["output_text"]Append the assistant reply
Record an assistant_message event the same way, advancing the cursor again. The branch is now the full transcript of the turn.
Optimistic concurrency
Every append advances the branch's version and head_event_id. Because each request sends the expected_version it last saw (plus the expected_head_event_id), two writers racing on the same branch cannot clobber each other: the stale one gets a branch_version_conflict (HTTP 409) and can re-read and retry. This is what makes the append-only transcript safe under concurrency. See sessions and branching and QoS.
Notes
- Event types are the native enum:
user_message,assistant_message,tool_result,retrieval_result,checkpoint,note. - To attach a tool call's input or output to the transcript, store it as an artifact and pass the artifact id as the event's
payload_ref. - To explore an alternative without disturbing the main line, fork a branch with
POST /v2/sessions/{sid}/branches(fork_from_branch_id, optionalfork_from_event_id).
RAG
Retrieval-augmented generation where the retrieved documents form a stable prefix pinned to a session, so repeated queries reuse the compiled document context.
Execution profiles
The four ways Zumik runs a request - managed providers by default, BYOK, BYOC, and hybrid - plus the OpenRouter emergency fallback, and the rule that BYOC is only ever activated when replay proves it.