Zumik
CLI & tools

zumik CLI

The zumik command runs the workload-analysis funnel locally - score, diagnose, lint, and proxy - before any data leaves your environment.

zumik is the command-line companion to the hosted workload diagnostics. It runs the whole analysis funnel on your machine: capture metadata traces, score them, build the full diagnostic, and lint a prompt's layout. Nothing leaves your environment unless you explicitly point a command at a live deployment.

Install

cargo install --path tools/zumik-cli
# or, from the repo, without installing:
cargo run -p zumik-cli -- <command>

The binary is named zumik. It is a Rust crate (zumik-cli 0.1.0) and shares the scoring engine with the hosted diagnostic, so the numbers match.

The funnel

Capture

Put zumik proxy in front of your OpenAI-compatible endpoint and run a representative slice of traffic. It writes one metadata-only trace per request.

Score

zumik score turns the trace bundle into a Workload Reuse Score with its interpretation band and recommended action.

Diagnose

zumik diagnose builds the full report: the reuse waterfall and the lowest-complexity execution profile the evidence supports.

Lint

zumik lint checks a prompt's layout for the structure that quietly defeats provider-native caching.

Add --json to score, diagnose, or lint for machine-readable output.

zumik proxy

Sit in front of any OpenAI-compatible endpoint and record one metadata-only trace per request: token estimates, timing, a stable-prefix fingerprint, and the recurring prefix family it belongs to. Raw prompt text is never written.

zumik proxy --upstream https://api.openai.com --listen 127.0.0.1:8080 --out workload.jsonl
FlagDefaultPurpose
--upstream(required)Base URL to forward to, e.g. https://api.openai.com
--listen127.0.0.1:8080Address to bind
--outzumik-traces.jsonlJSONL file to append traces to

Point your OpenAI client's base URL at http://127.0.0.1:8080, run normal traffic, then stop the proxy. The output file is the trace bundle for the other commands. See Trace-capture proxy for exactly what it records and the privacy guarantees.

zumik score

Compute the Workload Reuse Score from a trace bundle (JSON array or JSONL).

zumik score workload.jsonl
Output
Workload Reuse Score: 63.4 / 100   (plausible fit)
Recommended action:  run diagnostic and provider tuning
Traces analyzed:     420

Components (weight × value):
  opportunity_ratio    0.35 × 0.82  =  28.7
  recurrence_score     0.20 × 0.74  =  14.8
  retention_locality   0.15 × 0.61  =   9.2
  ttft_sensitivity     0.15 × 0.40  =   6.0
  session_continuity   0.10 × 0.30  =   3.0
  payload_redundancy   0.05 × 0.34  =   1.7

The six components are weighted exactly as the plan fixes them; the table prints weight × value and the points each contributes. Deployment feasibility is deliberately excluded - long prompts alone never recommend BYOC.

zumik diagnose

Build the full Agent Workload Efficiency Diagnostic from a trace bundle. By default it runs locally so you can read the report before any data is sent.

zumik diagnose workload.jsonl
Output
Workload Reuse Score: 78.2 / 100   (prioritize optimization pilot)
Recommended profile:  managed-provider tuning

Reuse waterfall:
  Total input tokens        7560000  100.0%  ████████████████████
  Eligible reuse            6210000   82.1%  ████████████████
  Candidate reuse           6210000   82.1%  ████████████████
  Realized reused           5040000   66.7%  █████████████
  Missed opportunity        1170000   15.5%  ███

Notes:
  - Of 6210000 candidate reusable tokens, 5040000 were captured (81% capture rate).
  - 1170000 tokens of reuse opportunity were missed; investigate prompt ordering and cache-key strategy before changing infrastructure.

To store the run on a live deployment instead of computing locally, pass --api-key (or set ZUMIK_API_KEY); the CLI then calls the deployment's diagnostics endpoint and prints the stored report.

zumik diagnose workload.jsonl --api-key zk_live_...
# or: ZUMIK_API_KEY=zk_live_... zumik diagnose workload.jsonl
FlagDefaultPurpose
--jsonoffEmit the raw report JSON
--api-keyfrom ZUMIK_API_KEYRun against a live deployment and store the report
--base-urlhttps://api.zumik.aiAPI host (only used with --api-key)

zumik lint

Check a prompt's layout for the structure that defeats provider-native prompt caching: volatile content in the stable prefix, dynamic blocks ahead of stable ones, the latest user turn not last, and a stable prefix too short to be cache-eligible.

zumik lint prompt.json

It accepts {"messages":[...]}, a bare message array, or Zumik blocks with a kind field.

Output
Prompt-layout score: 65/100
Stable-prefix tokens: ~1280

[HIGH] block 0 (system): stable-prefix block contains volatile content (iso timestamp)
        fix: Move per-request values (timestamps, ids, dates) into the latest user turn so the prefix stays byte-stable across requests.
[LOW ] block 2 (assistant): the final block is not the latest user input / tool result
        fix: Place the dynamic, changes-every-request content last so everything before it can be reused.

See Prompt linter for the full set of checks and how the score is computed, and Prompt layout for the ordering rules behind them.

See also

On this page