Prompt linter
zumik lint and the web prompt-linter check a prompt's layout for the structure that defeats provider-native caching - volatile content in the stable prefix, bad ordering, and a sub-1024-token prefix.
The prompt linter reads an ordered prompt and flags the layout that quietly defeats provider-native prompt caching. It checks structure, not content quality: whether your prompt preserves a reusable, byte-stable prefix. The rules behind it are the prompt-layout ordering every cache-aware request should follow.
It runs as the zumik lint subcommand and as a web prompt-linter; both apply the same checks.
Run it
zumik lint prompt.jsonIt accepts three input shapes:
- an OpenAI chat envelope,
{"messages": [...]} - a bare message array,
[ { "role": ..., "content": ... }, ... ] - Zumik blocks - the same shape with an optional
kindfield (e.g.tools,response_schema,document)
Each block is classified onto a stability spectrum from role (or kind when present): system and developer policy, then tools, then schema, then stable context, then history, then retrieval, then the latest user input or tool result. The first four are the cacheable stable prefix.
The checks
| Check | Severity | What it catches |
|---|---|---|
| Volatile content in the stable prefix | HIGH (-30) | A timestamp, date, UUID, long numeric id, or a request_id / session id / current date keyword inside a stable-prefix block. Anything that changes per request busts the cached prefix every call. |
| Ordering regression | MEDIUM (-15) | A stable block appearing after more dynamic content. Each regression shortens the cacheable prefix. |
| Latest turn not last | LOW (-5) | The final block is not the latest user input or tool result, so reusable content sits behind the volatile turn. |
| Stable prefix below the cache minimum | LOW (-5) | The stable prefix is under ~1024 tokens, the threshold providers require before caching engages. Below it, even a perfectly ordered prefix is not cached. |
The volatile-content detector is keyword- and pattern-based: it matches canonical UUIDs (8-4-4-4-12 hex), ISO 8601 timestamps (YYYY-MM-DDThh), runs of 10+ digits, and time/id keywords. Token counts are a whitespace estimate, not tokenizer-exact.
Reading the report
The layout score starts at 100 and subtracts each finding's penalty (HIGH 30, MEDIUM 15, LOW 5), floored at 0.
Prompt-layout score: 65/100
Stable-prefix tokens: ~1280
[HIGH] block 0 (system): stable-prefix block contains volatile content (iso timestamp)
fix: Move per-request values (timestamps, ids, dates) into the latest user turn so the prefix stays byte-stable across requests.
[LOW ] block 2 (assistant): the final block is not the latest user input / tool result
fix: Place the dynamic, changes-every-request content last so everything before it can be reused.A clean prompt prints a single line and a score of 100:
Prompt-layout score: 100/100
Stable-prefix tokens: ~1180
No layout issues found. The stable prefix is reuse-friendly.Machine-readable output
--json emits the score, the estimated stable-prefix token count, and the findings:
zumik lint prompt.json --json{
"layout_score": 65,
"stable_prefix_tokens": 1280,
"findings": [
{
"severity": "high",
"block": 0,
"role": "system",
"message": "stable-prefix block contains volatile content (iso timestamp)",
"fix": "Move per-request values (timestamps, ids, dates) into the latest user turn so the prefix stays byte-stable across requests."
}
]
}The fix pattern
Every finding points the same direction: keep stable content at the front and byte-stable across requests, and put everything that changes last.
{
"messages": [
{ "role": "system", "content": "<repository policy, tool rules - unchanged across turns>" },
{ "role": "system", "kind": "tools", "content": "<tool definitions>" },
{ "role": "system", "kind": "schema", "content": "<response schema>" },
{ "role": "user", "content": "Today is 2026-06-15. Review the latest patch." }
]
}The per-request value (the date) lives in the last turn, so the system, tools, and schema blocks form a stable prefix the provider can cache. See prompt layout for the full ordering and workload diagnostics for measuring the reuse you actually capture once the layout is clean.
Tip
Lint before you measure. A bad layout caps how much reuse is even possible, so fixing ordering and volatile content first makes the Workload Reuse Score reflect your real opportunity.
Trace-capture proxy
zumik proxy sits in front of an OpenAI-compatible endpoint and records one metadata-only trace per request - token counts, timing, and a prefix fingerprint, never raw prompts.
Coding agents
Use Zumik as a drop-in OpenAI-compatible endpoint for Cline, Roo Code, Continue, Aider, and the OpenAI SDKs - one base URL, one key, full request fidelity, and caching on by default.