Workload diagnostics

Capture metadata traces, run the Agent Workload Efficiency Diagnostic, read the Workload Reuse Score and reuse waterfall, and get a recommended execution profile - before you change any infrastructure.

The Agent Workload Efficiency Diagnostic scores how much reuse your traffic actually contains and tells you what to do about it. It runs on metadata - lengths, timing, fingerprints, lineage - so you never have to hand over raw prompt text to find out whether reuse is worth chasing.

The four steps

Capture traces

Export metadata traces from your existing traffic. Each trace records what was requested and what was observed - token counts, timing, the resolved target, and how much was reusable vs. reused. The fastest way to capture without instrumenting your app is the CLI proxy, which sits in front of an OpenAI-compatible endpoint and writes metadata-only traces to a file.

Score the workload

POST /v2/diagnostics computes a Workload Reuse Score (WRS) from six components, plus a band and a recommended action.

Read the reuse waterfall

The waterfall separates the reuse you could capture from the reuse you did, surfacing the missed-opportunity gap.

Act on the recommended profile

The report names the next step - usually prompt ordering, sometimes provider tuning, rarely a BYOC pilot - and a signed report you can hand to a stakeholder.

Capture traces locally

The zumik CLI runs a metadata-only proxy in front of any OpenAI-compatible endpoint and appends one trace per request to a JSONL file - no prompt text leaves your machine:

zumik proxy --upstream https://api.openai.com --out zumik-traces.jsonl
# point your client at http://127.0.0.1:8080, run real traffic, then:
zumik diagnose zumik-traces.jsonl

zumik diagnose builds the full report locally, or runs it against a live deployment's /v2/diagnostics when you pass --api-key. See the CLI reference.

Run it

traces is a non-empty array of metadata trace envelopes. Use privacy_mode: "metadata" by default; richer modes (tokenized, encrypted_full_fidelity, synthetic) exist for replay but are not needed to score a workload.

curl https://api.zumik.ai/v2/diagnostics \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "traces": [
      {
        "trace_id": "trc_a1",
        "privacy_mode": "metadata",
        "prefix_family_id": "pf_agent_main",
        "schedule": { "session_id": "ses_1" },
        "observed": {
          "resolved_target": "anthropic/claude@2025-02-01",
          "ttft_ms": 700, "latency_ms": 1200,
          "input_tokens": 10000, "candidate_reuse_tokens": 9000,
          "realized_reused_tokens": 8500, "output_tokens": 200,
          "attempt_count": 1
        }
      }
    ]
  }'

Response

{
  "id": "dgn_01jy…",
  "object": "diagnostic",
  "project_id": "prj_01jy…",
  "created_at": "2026-06-15T12:00:00Z",
  "report": {
    "object": "diagnostic_report",
    "trace_count": 1,
    "workload_reuse_score": 78.5,
    "band": "strong_fit",
    "recommended_action": "prioritize optimization pilot",
    "components": { "opportunity_ratio": 0.9, "recurrence_score": 0.8, "retention_locality": 0.7, "ttft_sensitivity": 0.6, "session_continuity": 0.5, "payload_redundancy": 0.4 },
    "waterfall": { "total_input_tokens": 10000, "eligible_reuse_tokens": 9000, "candidate_reuse_tokens": 9000, "realized_reused_tokens": 8500, "missed_opportunity_tokens": 500 },
    "recommended_profile": "managed_provider_tuning",
    "notes": ["Of 9000 candidate reusable tokens, 8500 were captured (94% capture rate)."]
  }
}

Reading the Workload Reuse Score

The WRS is a 0-100 score built from six weighted components. A high score means there is reuse to capture; it does not mean you should self-host.

Component	Weight	What it measures
`opportunity_ratio`	0.35	Share of input tokens that could be served from cache.
`recurrence_score`	0.20	How often the same prefix family recurs.
`retention_locality`	0.15	Whether recurrences land close enough in time to stay warm.
`ttft_sensitivity`	0.15	How much first-token latency matters for this traffic.
`session_continuity`	0.10	How much work stays within a single session.
`payload_redundancy`	0.05	Repeated payloads across requests.

The score maps to a band and a recommended action:

Band	Score	Recommended action
`strong_fit`	≥ 70	Prioritize an optimization pilot.
`plausible_fit`	≥ 45	Run diagnostic and provider tuning.
`limited_fit`	≥ 20	Optimize prompt construction first.
`weak_fit`	< 20	Don't pursue BYOC or custom caching.

The reuse waterfall

The waterfall is where opportunity meets reality. Each tier is a subset of the one above it:

total_input_tokens        10000   everything sent
eligible_reuse_tokens      9000   could be reused given the prefix family
candidate_reuse_tokens     9000   the runtime considered for reuse
realized_reused_tokens     8500   actually served from cache, billed at the read rate
missed_opportunity_tokens   500   candidate − realized: the gap to close

A large missed_opportunity_tokens relative to candidate_reuse_tokens is the signal that prompt ordering or provider tuning will pay off. A small gap with a high capture rate means the providers are already doing the work - and no migration will beat them.

The recommended profile

recommended_profile is deliberately conservative. BYOC is only ever recommended when a large missed gap justifies it, never on prompt length alone:

Profile	When
`optimize_prompt_construction`	Weak or limited fit - fix ordering before anything else.
`managed_provider_tuning`	Plausible fit, or strong fit where capture is already high (≥ 70%).
`byoc_pilot_worth_evaluating`	Strong fit and a large fraction of reuse is still being missed.

Signed report

GET /v2/diagnostics/{id}/report returns the same report wrapped with a generated_at timestamp and an evidence_digest (sig_<64 hex>) over the serialized report. The digest lets a recipient confirm the numbers were not altered after the fact - useful when the diagnostic is the basis for a pilot decision.

Why a high score isn't a buy signal

Deployment readiness is scored separately from reuse opportunity.