Prompt linter

zumik lint and the web prompt-linter check a prompt's layout for the structure that defeats provider-native caching - volatile content in the stable prefix, bad ordering, and a sub-1024-token prefix.

The prompt linter reads an ordered prompt and flags the layout that quietly defeats provider-native prompt caching. It checks structure, not content quality: whether your prompt preserves a reusable, byte-stable prefix. The rules behind it are the prompt-layout ordering every cache-aware request should follow.

It runs as the zumik lint subcommand and as a web prompt-linter; both apply the same checks.

Run it

zumik lint prompt.json

It accepts three input shapes:

an OpenAI chat envelope, {"messages": [...]}
a bare message array, [ { "role": ..., "content": ... }, ... ]
Zumik blocks - the same shape with an optional kind field (e.g. tools, response_schema, document)

Each block is classified onto a stability spectrum from role (or kind when present): system and developer policy, then tools, then schema, then stable context, then history, then retrieval, then the latest user input or tool result. The first four are the cacheable stable prefix.

The checks

Check	Severity	What it catches
Volatile content in the stable prefix	HIGH (-30)	A timestamp, date, UUID, long numeric id, or a `request_id` / `session id` / `current date` keyword inside a stable-prefix block. Anything that changes per request busts the cached prefix every call.
Ordering regression	MEDIUM (-15)	A stable block appearing after more dynamic content. Each regression shortens the cacheable prefix.
Latest turn not last	LOW (-5)	The final block is not the latest user input or tool result, so reusable content sits behind the volatile turn.
Stable prefix below the cache minimum	LOW (-5)	The stable prefix is under ~1024 tokens, the threshold providers require before caching engages. Below it, even a perfectly ordered prefix is not cached.

The volatile-content detector is keyword- and pattern-based: it matches canonical UUIDs (8-4-4-4-12 hex), ISO 8601 timestamps (YYYY-MM-DDThh), runs of 10+ digits, and time/id keywords. Token counts are a whitespace estimate, not tokenizer-exact.

Reading the report

The layout score starts at 100 and subtracts each finding's penalty (HIGH 30, MEDIUM 15, LOW 5), floored at 0.

Output

Prompt-layout score: 65/100
Stable-prefix tokens: ~1280

[HIGH] block 0 (system): stable-prefix block contains volatile content (iso timestamp)
        fix: Move per-request values (timestamps, ids, dates) into the latest user turn so the prefix stays byte-stable across requests.
[LOW ] block 2 (assistant): the final block is not the latest user input / tool result
        fix: Place the dynamic, changes-every-request content last so everything before it can be reused.

A clean prompt prints a single line and a score of 100:

Clean

Prompt-layout score: 100/100
Stable-prefix tokens: ~1180

No layout issues found. The stable prefix is reuse-friendly.

Machine-readable output

--json emits the score, the estimated stable-prefix token count, and the findings:

zumik lint prompt.json --json

{
  "layout_score": 65,
  "stable_prefix_tokens": 1280,
  "findings": [
    {
      "severity": "high",
      "block": 0,
      "role": "system",
      "message": "stable-prefix block contains volatile content (iso timestamp)",
      "fix": "Move per-request values (timestamps, ids, dates) into the latest user turn so the prefix stays byte-stable across requests."
    }
  ]
}

The fix pattern

Every finding points the same direction: keep stable content at the front and byte-stable across requests, and put everything that changes last.

Reuse-friendly layout

{
  "messages": [
    { "role": "system",  "content": "<repository policy, tool rules - unchanged across turns>" },
    { "role": "system",  "kind": "tools",  "content": "<tool definitions>" },
    { "role": "system",  "kind": "schema", "content": "<response schema>" },
    { "role": "user",    "content": "Today is 2026-06-15. Review the latest patch." }
  ]
}

The per-request value (the date) lives in the last turn, so the system, tools, and schema blocks form a stable prefix the provider can cache. See prompt layout for the full ordering and workload diagnostics for measuring the reuse you actually capture once the layout is clean.

Tip

Lint before you measure. A bad layout caps how much reuse is even possible, so fixing ordering and volatile content first makes the Workload Reuse Score reflect your real opportunity.