Zumik
Guides

Prompt layout

The recommended block ordering that preserves a reusable prefix, the reuse-killers that quietly defeat provider caching, and the zumik lint CLI and web prompt-linter that catch them.

Prompt-cache capture lives and dies by layout. Every provider rewards the same shape: keep stable content at the front so the prefix stays byte-identical across requests, and push everything that changes every call to the very end. Get the ordering right once and the discount applies on whichever provider answers. Get a volatile token into the prefix and your hit rate falls to near zero.

Assemble logical blocks from most stable to most dynamic:

1. system policy          6. compacted checkpoints
2. developer policy       7. ordered branch history
3. stable tool bundle     8. dynamic retrieval blocks
4. response schema        9. latest user input
5. stable tenant/workspace context   10. latest tool result

This is exactly what Zumik state encodes: a bundle is the stable prefix (blocks 1-5), a branch is the ordered middle (blocks 6-7), and the latest turn is the volatile tail (blocks 8-10). OpenAI, Anthropic, and Gemini all recommend stable-first construction for the same reason - it maximizes the cacheable prefix.

Zumik may recommend prefix-friendly construction, but it will not silently reorder semantically meaningful events. Ordering is your decision; the platform measures whether your decision preserves reuse and tells you when it does not.

Reuse-killers to avoid

A "reuse-killer" is anything that changes the bytes of the stable prefix from one request to the next. Once the prefix differs, the cache cannot match it, and the discount is gone.

Reuse-killerWhy it hurtsFix
A timestamp or "current date" in the system blockChanges every request, so the prefix never repeatsMove per-request values into the latest user turn
A request id, trace id, session id, or UUID in the prefixSame: a unique token at the top resets the whole prefixKeep ids out of stable blocks entirely
A freshly shuffled or re-serialized tool listReordered tool schema is a different prefix even with identical toolsSerialize tools once, deterministically, and reuse the bytes
Dynamic retrieval placed before stable blocksPushes the stable content past the cacheable boundaryPut retrieval after the stable prefix, before the latest turn
The latest user turn not lastAnything after it cannot be reusedPlace the changes-every-request content at the very end
A stable prefix below ~1,024 tokensBelow the provider minimum, no cache engages at allConsolidate instructions, tools, and schema into the prefix

The single most common cause of a low capture rate is the first row: a volatile value sitting in an otherwise-stable system prompt.

Lint your layout

Zumik ships a layout linter that reads an ordered prompt and flags these reuse-killers before you ever spend a token on a cold prefix. It checks layout, not content - whether the structure preserves a reusable prefix, not whether the instructions are any good.

zumik lint (CLI)

zumik lint accepts an OpenAI messages array, a bare message array, or Zumik blocks (the same shape with a kind field):

zumik lint prompt.json
Output
Prompt-layout score: 70/100
Stable-prefix tokens: ~1840

[HIGH] block 0 (system): stable-prefix block contains volatile content (iso timestamp)
        fix: Move per-request values (timestamps, ids, dates) into the latest user turn so the
             prefix stays byte-stable across requests.
[MED ] block 3 (user): UserInput block appears after more dynamic content; this shortens the cacheable prefix
        fix: Reorder so stable content (system, tools, schema, context) precedes history, retrieval,
             and the latest user input.

Add --json for machine-readable output to wire into CI:

{
  "layout_score": 70,
  "stable_prefix_tokens": 1840,
  "findings": [
    {
      "severity": "high",
      "block": 0,
      "role": "system",
      "message": "stable-prefix block contains volatile content (iso timestamp)",
      "fix": "Move per-request values (timestamps, ids, dates) into the latest user turn..."
    }
  ]
}

The layout_score starts at 100 and subtracts per finding: 30 for a high-severity issue (volatile content in the prefix), 15 for a medium (ordering regression), 5 for a low (final block is not the latest turn, or the prefix is below the cache minimum). A clean, cache-friendly layout scores 100.

What the linter detects

Volatile content in the prefix (HIGH)

Timestamps, ISO datetimes, UUIDs, long numeric ids, and time/id keywords found inside a system, tools, schema, or context block.

Ordering regressions (MED)

A stable block appearing after more dynamic content, which shortens the cacheable prefix.

Volatile tail missing (LOW)

The final block is not the latest user input or tool result, so content that could be reused sits after content that changes.

Prefix too short (LOW)

A stable prefix below the ~1,024-token minimum providers require before caching engages.

Web prompt-linter

The same checks run in the browser at the prompt-linter tool - paste a prompt, read the score and findings, and iterate without the CLI. It is the fastest way to sanity-check a template before shipping it.

On this page