Prompt layout

The recommended block ordering that preserves a reusable prefix, the reuse-killers that quietly defeat provider caching, and the zumik lint CLI and web prompt-linter that catch them.

Prompt-cache capture lives and dies by layout. Every provider rewards the same shape: keep stable content at the front so the prefix stays byte-identical across requests, and push everything that changes every call to the very end. Get the ordering right once and the discount applies on whichever provider answers. Get a volatile token into the prefix and your hit rate falls to near zero.

The recommended order

Assemble logical blocks from most stable to most dynamic:

1. system policy          6. compacted checkpoints
2. developer policy       7. ordered branch history
3. stable tool bundle     8. dynamic retrieval blocks
4. response schema        9. latest user input
5. stable tenant/workspace context   10. latest tool result

This is exactly what Zumik state encodes: a bundle is the stable prefix (blocks 1-5), a branch is the ordered middle (blocks 6-7), and the latest turn is the volatile tail (blocks 8-10). OpenAI, Anthropic, and Gemini all recommend stable-first construction for the same reason - it maximizes the cacheable prefix.

Zumik may recommend prefix-friendly construction, but it will not silently reorder semantically meaningful events. Ordering is your decision; the platform measures whether your decision preserves reuse and tells you when it does not.

Reuse-killers to avoid

A "reuse-killer" is anything that changes the bytes of the stable prefix from one request to the next. Once the prefix differs, the cache cannot match it, and the discount is gone.

Reuse-killer	Why it hurts	Fix
A timestamp or "current date" in the system block	Changes every request, so the prefix never repeats	Move per-request values into the latest user turn
A request id, trace id, session id, or UUID in the prefix	Same: a unique token at the top resets the whole prefix	Keep ids out of stable blocks entirely
A freshly shuffled or re-serialized tool list	Reordered tool schema is a different prefix even with identical tools	Serialize tools once, deterministically, and reuse the bytes
Dynamic retrieval placed before stable blocks	Pushes the stable content past the cacheable boundary	Put retrieval after the stable prefix, before the latest turn
The latest user turn not last	Anything after it cannot be reused	Place the changes-every-request content at the very end
A stable prefix below ~1,024 tokens	Below the provider minimum, no cache engages at all	Consolidate instructions, tools, and schema into the prefix

The single most common cause of a low capture rate is the first row: a volatile value sitting in an otherwise-stable system prompt.

Lint your layout

Zumik ships a layout linter that reads an ordered prompt and flags these reuse-killers before you ever spend a token on a cold prefix. It checks layout, not content - whether the structure preserves a reusable prefix, not whether the instructions are any good.

zumik lint (CLI)

zumik lint accepts an OpenAI messages array, a bare message array, or Zumik blocks (the same shape with a kind field):

zumik lint prompt.json

Output

Prompt-layout score: 70/100
Stable-prefix tokens: ~1840

[HIGH] block 0 (system): stable-prefix block contains volatile content (iso timestamp)
        fix: Move per-request values (timestamps, ids, dates) into the latest user turn so the
             prefix stays byte-stable across requests.
[MED ] block 3 (user): UserInput block appears after more dynamic content; this shortens the cacheable prefix
        fix: Reorder so stable content (system, tools, schema, context) precedes history, retrieval,
             and the latest user input.

Add --json for machine-readable output to wire into CI:

{
  "layout_score": 70,
  "stable_prefix_tokens": 1840,
  "findings": [
    {
      "severity": "high",
      "block": 0,
      "role": "system",
      "message": "stable-prefix block contains volatile content (iso timestamp)",
      "fix": "Move per-request values (timestamps, ids, dates) into the latest user turn..."
    }
  ]
}

The layout_score starts at 100 and subtracts per finding: 30 for a high-severity issue (volatile content in the prefix), 15 for a medium (ordering regression), 5 for a low (final block is not the latest turn, or the prefix is below the cache minimum). A clean, cache-friendly layout scores 100.

What the linter detects

Volatile content in the prefix (HIGH)

Timestamps, ISO datetimes, UUIDs, long numeric ids, and time/id keywords found inside a system, tools, schema, or context block.

Ordering regressions (MED)

A stable block appearing after more dynamic content, which shortens the cacheable prefix.

Volatile tail missing (LOW)

The final block is not the latest user input or tool result, so content that could be reused sits after content that changes.

Prefix too short (LOW)

A stable prefix below the ~1,024-token minimum providers require before caching engages.