zumik CLI
The zumik command runs the workload-analysis funnel locally - score, diagnose, lint, and proxy - before any data leaves your environment.
zumik is the command-line companion to the hosted workload diagnostics. It runs the whole analysis funnel on your machine: capture metadata traces, score them, build the full diagnostic, and lint a prompt's layout. Nothing leaves your environment unless you explicitly point a command at a live deployment.
Install
cargo install --path tools/zumik-cli
# or, from the repo, without installing:
cargo run -p zumik-cli -- <command>The binary is named zumik. It is a Rust crate (zumik-cli 0.1.0) and shares the scoring engine with the hosted diagnostic, so the numbers match.
The funnel
Capture
Put zumik proxy in front of your OpenAI-compatible endpoint and run a representative slice of traffic. It writes one metadata-only trace per request.
Score
zumik score turns the trace bundle into a Workload Reuse Score with its interpretation band and recommended action.
Diagnose
zumik diagnose builds the full report: the reuse waterfall and the lowest-complexity execution profile the evidence supports.
Lint
zumik lint checks a prompt's layout for the structure that quietly defeats provider-native caching.
Add --json to score, diagnose, or lint for machine-readable output.
zumik proxy
Sit in front of any OpenAI-compatible endpoint and record one metadata-only trace per request: token estimates, timing, a stable-prefix fingerprint, and the recurring prefix family it belongs to. Raw prompt text is never written.
zumik proxy --upstream https://api.openai.com --listen 127.0.0.1:8080 --out workload.jsonl| Flag | Default | Purpose |
|---|---|---|
--upstream | (required) | Base URL to forward to, e.g. https://api.openai.com |
--listen | 127.0.0.1:8080 | Address to bind |
--out | zumik-traces.jsonl | JSONL file to append traces to |
Point your OpenAI client's base URL at http://127.0.0.1:8080, run normal traffic, then stop the proxy. The output file is the trace bundle for the other commands. See Trace-capture proxy for exactly what it records and the privacy guarantees.
zumik score
Compute the Workload Reuse Score from a trace bundle (JSON array or JSONL).
zumik score workload.jsonlWorkload Reuse Score: 63.4 / 100 (plausible fit)
Recommended action: run diagnostic and provider tuning
Traces analyzed: 420
Components (weight × value):
opportunity_ratio 0.35 × 0.82 = 28.7
recurrence_score 0.20 × 0.74 = 14.8
retention_locality 0.15 × 0.61 = 9.2
ttft_sensitivity 0.15 × 0.40 = 6.0
session_continuity 0.10 × 0.30 = 3.0
payload_redundancy 0.05 × 0.34 = 1.7The six components are weighted exactly as the plan fixes them; the table prints weight × value and the points each contributes. Deployment feasibility is deliberately excluded - long prompts alone never recommend BYOC.
zumik diagnose
Build the full Agent Workload Efficiency Diagnostic from a trace bundle. By default it runs locally so you can read the report before any data is sent.
zumik diagnose workload.jsonlWorkload Reuse Score: 78.2 / 100 (prioritize optimization pilot)
Recommended profile: managed-provider tuning
Reuse waterfall:
Total input tokens 7560000 100.0% ████████████████████
Eligible reuse 6210000 82.1% ████████████████
Candidate reuse 6210000 82.1% ████████████████
Realized reused 5040000 66.7% █████████████
Missed opportunity 1170000 15.5% ███
Notes:
- Of 6210000 candidate reusable tokens, 5040000 were captured (81% capture rate).
- 1170000 tokens of reuse opportunity were missed; investigate prompt ordering and cache-key strategy before changing infrastructure.To store the run on a live deployment instead of computing locally, pass --api-key (or set ZUMIK_API_KEY); the CLI then calls the deployment's diagnostics endpoint and prints the stored report.
zumik diagnose workload.jsonl --api-key zk_live_...
# or: ZUMIK_API_KEY=zk_live_... zumik diagnose workload.jsonl| Flag | Default | Purpose |
|---|---|---|
--json | off | Emit the raw report JSON |
--api-key | from ZUMIK_API_KEY | Run against a live deployment and store the report |
--base-url | https://api.zumik.ai | API host (only used with --api-key) |
zumik lint
Check a prompt's layout for the structure that defeats provider-native prompt caching: volatile content in the stable prefix, dynamic blocks ahead of stable ones, the latest user turn not last, and a stable prefix too short to be cache-eligible.
zumik lint prompt.jsonIt accepts {"messages":[...]}, a bare message array, or Zumik blocks with a kind field.
Prompt-layout score: 65/100
Stable-prefix tokens: ~1280
[HIGH] block 0 (system): stable-prefix block contains volatile content (iso timestamp)
fix: Move per-request values (timestamps, ids, dates) into the latest user turn so the prefix stays byte-stable across requests.
[LOW ] block 2 (assistant): the final block is not the latest user input / tool result
fix: Place the dynamic, changes-every-request content last so everything before it can be reused.See Prompt linter for the full set of checks and how the score is computed, and Prompt layout for the ordering rules behind them.
See also
Rust SDK
An async reqwest-based client covering /v1 responses and the core /v2 state and diagnostics calls, returning serde_json::Value.
trace-analyzer
The @zumik/trace-analyzer npm tool turns a metadata-only trace bundle into a reuse waterfall and Workload Reuse Score, with no Rust toolchain and no raw prompts.