Zumik
Examples

Coding agent

A coding agent whose repo policy and tool bundle are a stable, reused prefix - built as artifacts, pinned to a session, then streamed over code.fast.

A coding agent has a stable prefix - the repository policy plus the tool bundle the model may call - that is identical on every turn. This example uploads it once as immutable artifacts, assembles it into an agent_prefix bundle, pins the bundle to a session, and then streams each turn over code.fast. The policy and tools are never resent in the message body; the compiled prefix is reused, which is exactly the reuse Zumik measures.

Source: examples/coding-agent. It uses two surfaces, both real: native /v2 to build the prefix, and the OpenAI-compatible /v1/chat/completions SSE stream to get a token stream.

Run

export ZUMIK_API_KEY="zk_..."
pip install -r requirements.txt   # httpx==0.28.1, openai==1.99.9
python agent.py

Walkthrough

Upload the stable prefix as artifacts

The repo policy goes up as a policy artifact; the tool definitions as a tool_bundle_source artifact (with content_media_type: application/json). Both are immutable.

policy = client.post("/v2/artifacts", headers=h, json={
    "artifact_type": "policy",
    "content": REPO_POLICY,
})
tools = client.post("/v2/artifacts", headers=h, json={
    "artifact_type": "tool_bundle_source",
    "content": TOOL_BUNDLE,
    "content_media_type": "application/json",
})

Assemble an agent_prefix bundle

The policy and tools become one ordered bundle. Each item carries a role (developer, tools) so the prompt compiler lays them out as a stable prefix.

prefix = client.post("/v2/bundles", headers=h, json={
    "bundle_type": "agent_prefix",
    "items": [
        {"artifact_id": policy_id, "role": "developer"},
        {"artifact_id": tools_id, "role": "tools"},
    ],
})

Open a session over the bundle

The session pins the bundle as its base context and returns a default_branch_id. That session is the reusable context for every turn.

session = client.post("/v2/sessions", headers=h, json={
    "base_bundle_ids": [prefix_id],
})
s = session.json()  # s["id"], s["default_branch_id"]

Stream turns over code.fast

Generation runs through the OpenAI-compatible /v1 stream. The session and branch ride on the Agent-Hints header (base64url JSON), so the platform attributes the turn to the session and reuses its compiled prefix. The message body carries only the user turn - never the policy or tools.

oa = OpenAI(
    base_url=V1_BASE,
    api_key=key,
    default_headers={"Agent-Hints": session_hints(sid, bid)},
)
stream = oa.chat.completions.create(
    model="code.fast",
    messages=[{"role": "user", "content": turn}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

The Agent-Hints header

Agent-Hints is a base64url-encoded JSON contract that binds an OpenAI-compatible request to native state without putting anything proprietary in the body. The example builds it like this:

hints = {
    "version": "2026-06-01",
    "session": {"session_id": session_id, "branch_id": branch_id},
    "reuse": {"preference": "prefer", "scope": "session"},
    "qos": {"class": "interactive", "target_ttft_ms": 500, "deadline_ms": 5000},
}
header = base64.urlsafe_b64encode(json.dumps(hints).encode()).rstrip(b"=").decode()

This is the bridge between the two surfaces: build and pin state on /v2, then run generation on /v1 with the stock OpenAI client and attribute it to the session. See OpenAI compatibility for the header rules and streaming.

The example talks to the API directly with httpx and openai, so it has no unpublished-SDK dependency. To use the first-party SDK instead, install the Python SDK and call create_artifact / create_bundle / create_session, which hit the same endpoints.

On this page