Zumik
Core concepts

Workload Reuse Score

The WRS formula, its six weighted components, the interpretation bands, and the deliberately separate deployment-readiness score.

The Workload Reuse Score (WRS) is a calibrated 0-to-100 number that answers one question: how much reuse does this workload actually have to capture? It replaces the old "median prompt above 8k tokens" heuristic, which conflated prompt length with reuse value and led teams to self-host when they did not need to.

The formula

WRS is a weighted sum of six normalized components, each in the range 0.0 to 1.0, scaled to 0-100:

WRS = 100 × (
    0.35 × opportunity_ratio
  + 0.20 × recurrence_score
  + 0.15 × retention_locality
  + 0.15 × ttft_sensitivity
  + 0.10 × session_continuity
  + 0.05 × payload_redundancy
)

Out-of-range component values are clamped to [0, 1] before weighting, so a noisy estimate can never push the score past 100 or below 0.

The six components

ComponentWeightDefinition
opportunity_ratio0.35Candidate reusable input tokens divided by total input tokens
recurrence_score0.20How often equivalent reusable prefix families recur
retention_locality0.15Share of recurrence falling inside relevant cache-retention windows
ttft_sensitivity0.15How much prefill latency matters to the customer experience or SLOs
session_continuity0.10Share of traffic in multi-turn or branched sessions
payload_redundancy0.05Repeated serialized bytes replaceable with reusable state references

Opportunity dominates the score by design. If most of your input is genuinely reusable, that alone carries a third of the way to a strong score - but recurrence and retention locality decide whether that opportunity is reachable in practice. High opportunity with no recurrence is reuse you can never capture.

Interpretation bands

The score maps to a band, and each band has one recommended action.

WRSBandRecommended action
70-100Strong fitPrioritize an optimization pilot
45-69Plausible fitRun a diagnostic and tune providers
20-44Limited fitOptimize prompt construction first
0-19Weak fitDo not sell BYOC or custom caching

A high WRS means there is reuse worth capturing. It does not mean you should self-host. Even a strong-fit workload usually belongs on managed providers if provider-native caching already captures most of the available reuse. The decision to evaluate BYOC depends on a large missed gap (see reuse metrics), not on the score alone.

Deployment readiness is a separate score

Infrastructure feasibility is tracked independently, so a long-prompt workload never recommends BYOC by accident. Deployment readiness covers:

  • provider capability coverage
  • BYOK feasibility
  • BYOC security approvals
  • cloud and region constraints
  • model stability
  • expected traffic concentration
  • engineering bandwidth
  • acceptable operational burden

WRS asks "is there reuse to capture?" Deployment readiness asks "can this customer realistically operate a self-hosted profile?" Both must be favorable before a BYOC profile is on the table. Folding them together is the exact mistake the two-score split prevents.

Where the score comes from

The diagnostic derives the six components from observed metadata traces - no raw prompt text required - and the report defends every number. A strong score with high capture recommends provider tuning; a strong score with poor capture is what flags BYOC as worth evaluating.

On this page