Zumik
Core concepts

The inference control plane

Why persistent agents pay repeatedly for the same context, and how Zumik measures reuse, preserves state, routes cheaply, reproduces routing, and proves deletion.

An agent does not send one prompt. It sends thousands, and most of them are nearly identical. The system instructions, the tool registry, the response schema, the repository policy, the long-lived documents, the conversation so far: that bulk is stable. What changes between calls is a short suffix at the end. Every time the stable part is re-tokenized and re-prefilled, the customer pays again, in latency and in tokens, for work that was already done.

Zumik is the control plane that sits over that pattern. It does not try to be a faster inference engine or a cheaper model marketplace. It measures how much of your input is genuinely reusable, gives you stable references for the reusable parts, routes each request to the cheapest reliable path, records exactly how that routing decision was made so it can be reproduced, and produces signed evidence when you delete data.

The five jobs

Measure reuse honestly

A repeated prefix is not a cache hit. Zumik reports reuse opportunity and realized capture as separate numbers, each tagged with an evidence level, so a prediction is never mistaken for a fact.

Preserve state as a first-class object

Reusable inputs become artifacts, bundles, sessions, branches, and snapshots: a small set of objects with explicit immutability and ordering rules, not opaque cache keys.

Route to the cheapest reliable path

Managed providers by default, with BYOK and BYOC as evidence-backed escalations. One scheduler owns replica selection per profile; provider-native caching does the heavy lifting before you ever self-host.

Reproduce every decision

Aliases resolve through immutable releases. Snapshots pin ordering and compiler versions. A response pins one snapshot and one alias release, so any past routing decision can be replayed.

Prove deletion

Delete revokes access; purge removes retained representations and returns a signed receipt with a guarantee class that never exceeds what the underlying profile can actually deliver.

Two surfaces, one engine

You reach the control plane through either of two public APIs over the same internal execution system.

SurfacePurposeShape
/v1Migrate an existing OpenAI integration with one base-URL changeByte-for-byte OpenAI request and response objects
/v2Use explicit state, branching, replay, purge, QoS, and rich telemetryNative Zumik objects with opaque handles

Proprietary behavior never leaks into /v1 JSON. It rides on optional headers, or it lives on /v2. A client that sends no Zumik headers still gets correct OpenAI behavior. See API surfaces for the boundary rules.

The idea that holds it together

The single most important distinction in the platform is that logical state is not physical KV state. Two requests can reference the same logical artifact and still need entirely different physical caches, because a different tokenizer, a different quantization, or a different region all break KV compatibility while leaving the logical content untouched.

Keeping these layers apart is what lets handles stay stable and opaque while caches churn underneath. It is worth reading the identity model before anything else.

Where the concepts connect

New to the product? Start at the Quickstart, then read the identity model and reuse metrics. Everything else builds on those two.

On this page