The inference control plane
Why persistent agents pay repeatedly for the same context, and how Zumik measures reuse, preserves state, routes cheaply, reproduces routing, and proves deletion.
An agent does not send one prompt. It sends thousands, and most of them are nearly identical. The system instructions, the tool registry, the response schema, the repository policy, the long-lived documents, the conversation so far: that bulk is stable. What changes between calls is a short suffix at the end. Every time the stable part is re-tokenized and re-prefilled, the customer pays again, in latency and in tokens, for work that was already done.
Zumik is the control plane that sits over that pattern. It does not try to be a faster inference engine or a cheaper model marketplace. It measures how much of your input is genuinely reusable, gives you stable references for the reusable parts, routes each request to the cheapest reliable path, records exactly how that routing decision was made so it can be reproduced, and produces signed evidence when you delete data.
The five jobs
Measure reuse honestly
A repeated prefix is not a cache hit. Zumik reports reuse opportunity and realized capture as separate numbers, each tagged with an evidence level, so a prediction is never mistaken for a fact.
Preserve state as a first-class object
Reusable inputs become artifacts, bundles, sessions, branches, and snapshots: a small set of objects with explicit immutability and ordering rules, not opaque cache keys.
Route to the cheapest reliable path
Managed providers by default, with BYOK and BYOC as evidence-backed escalations. One scheduler owns replica selection per profile; provider-native caching does the heavy lifting before you ever self-host.
Reproduce every decision
Aliases resolve through immutable releases. Snapshots pin ordering and compiler versions. A response pins one snapshot and one alias release, so any past routing decision can be replayed.
Prove deletion
Delete revokes access; purge removes retained representations and returns a signed receipt with a guarantee class that never exceeds what the underlying profile can actually deliver.
Two surfaces, one engine
You reach the control plane through either of two public APIs over the same internal execution system.
| Surface | Purpose | Shape |
|---|---|---|
/v1 | Migrate an existing OpenAI integration with one base-URL change | Byte-for-byte OpenAI request and response objects |
/v2 | Use explicit state, branching, replay, purge, QoS, and rich telemetry | Native Zumik objects with opaque handles |
Proprietary behavior never leaks into /v1 JSON. It rides on optional headers, or it lives on /v2. A client that sends no Zumik headers still gets correct OpenAI behavior. See API surfaces for the boundary rules.
The idea that holds it together
The single most important distinction in the platform is that logical state is not physical KV state. Two requests can reference the same logical artifact and still need entirely different physical caches, because a different tokenizer, a different quantization, or a different region all break KV compatibility while leaving the logical content untouched.
Keeping these layers apart is what lets handles stay stable and opaque while caches churn underneath. It is worth reading the identity model before anything else.
Where the concepts connect
Reuse metrics
Opportunity vs. realized reuse, and the five evidence levels.
Workload Reuse Score
The six-component score and the separate deployment-readiness score.
Identity model
Logical, materialization, and KV-realization layers.
Artifacts
Immutable reusable content units and their types.
Bundles
Ordered, immutable lists of artifacts.
Sessions
Causal state containers for an agent workflow.
Branches
Append-only event lines with optimistic concurrency.
Snapshots
Compiled logical state pinned to a branch head.
Handles and fingerprints
Opaque public IDs vs. internal tenant-scoped HMACs.
Model aliases
Reproducible aliases and immutable releases.
Agent Hints
The vendor-neutral intent contract.
QoS
Classes, requests, and formal outcome objects.
Execution profiles
Managed, BYOK, BYOC, hybrid, and fallback.
Capability manifests
Per-provider capability records that gate routing and purge.
Retention and purge
Delete vs. purge, guarantee classes, and resurrection prevention.
API surfaces
The /v1 and /v2 contract and the extension rule.
New to the product? Start at the Quickstart, then read the identity model and reuse metrics. Everything else builds on those two.