The three identity layers
Logical identity, materialization identity, and KV-realization compatibility - why the same logical state can need different physical caches, and why cache details never reach the public API.
This is the architectural decision the rest of the platform depends on. Get it wrong and cache implementation details leak into your product semantics: handles stop being stable, deletion gets confusing, and "is this reusable?" turns into a question no one can answer. Zumik separates identity into three layers and never collapses them.
Layer 1: logical identity
Logical identity describes the customer-visible reusable state: artifacts, bundles, sessions, branches, and snapshots. It answers "what content does the model see, and in what order?"
Logical identity is independent of provider, model, tokenizer, engine, and GPU topology. A snapshot is the same snapshot whether it runs on OpenAI today or a self-hosted SGLang cluster tomorrow. This is the layer your handles point at.
Layer 2: materialization identity
Materialization identity describes the exact model-visible representation: the actual bytes a tokenizer produced from the logical state. Change the tokenizer or the chat template and the logical content is unchanged, but the materialization is different.
The materialization key is a digest over:
materialization_key =
snapshot_id
+ prompt_compiler_revision
+ tokenizer_revision
+ chat_template_revision
+ tool_serialization_revision
+ response_schema_serialization_revision
+ ordered_block_manifest_digestFields are length-prefixed before hashing, so ("ab", "c") and ("a", "bc") never collide into the same digest. Two materializations are equal only when every field matches.
Layer 3: KV-realization compatibility
KV-realization compatibility answers a narrower, physical question: can an existing GPU-resident KV cache be reused safely for this request? It extends the materialization digest with everything about the runtime:
kv_compatibility_key =
materialization_digest
+ resolved_model_revision
+ weight_digest
+ quantization_profile
+ engine_family + engine_version
+ cache_abi_revision
+ attention_backend
+ rope_configuration
+ parallelism_topology
+ block_size
+ cache_format_revision
+ region
+ isolation_namespace_generationThat last field, isolation_namespace_generation, is what a purge increments. Bump it and every prior KV entry in that namespace fails the compatibility check, so stale physical cache cannot be resurrected after a deletion.
The scenarios that matter
Two requests can share the same logical artifact yet still require different physical caches. This table is the whole point of the model.
| Scenario | Same logical? | Same materialization? | Same KV realization? |
|---|---|---|---|
| Same repository instructions, same model and provider | yes | likely | maybe |
| Same instructions, different tokenizer | yes | no | no |
| Same instructions, same model, different quantization | yes | yes | no |
| Same instructions, managed provider vs. BYOC | yes | maybe | no |
| Same session, different branch head | partial | no | no |
| Same artifacts, reordered tool schema | maybe | no | no |
Notice that "same logical" can pair with "no KV realization". That is normal and expected. A reusable handle is a promise about logical identity, never a promise that a physical cache hit will occur. The platform reports the difference through reuse metrics.
Why this keeps the API clean
Because handles live at Layer 1, they survive everything that happens at Layers 2 and 3. You can switch quantization, move regions, or migrate from a managed provider to your own cluster, and your artifact and session IDs do not change. The cache churns; the contract does not.
It also makes cache invalidation tractable. Any change to a Layer 2 or Layer 3 input produces a different key, so a stale cache can never match a request it should not serve. There is no separate invalidation pass to get wrong: identity is the invalidation rule.
Snapshots
A compiled, ordered logical state pinned to a branch head - the unit a response binds to, sitting at Layer 1 of the identity model.
Handles and fingerprints
Opaque public IDs that callers hold, versus internal tenant-scoped HMAC fingerprints that never leave the isolation boundary - and why raw content hashes are never exposed.