Workload Reuse Score
The WRS formula, its six weighted components, the interpretation bands, and the deliberately separate deployment-readiness score.
The Workload Reuse Score (WRS) is a calibrated 0-to-100 number that answers one question: how much reuse does this workload actually have to capture? It replaces the old "median prompt above 8k tokens" heuristic, which conflated prompt length with reuse value and led teams to self-host when they did not need to.
The formula
WRS is a weighted sum of six normalized components, each in the range 0.0 to 1.0, scaled to 0-100:
WRS = 100 × (
0.35 × opportunity_ratio
+ 0.20 × recurrence_score
+ 0.15 × retention_locality
+ 0.15 × ttft_sensitivity
+ 0.10 × session_continuity
+ 0.05 × payload_redundancy
)Out-of-range component values are clamped to [0, 1] before weighting, so a noisy estimate can never push the score past 100 or below 0.
The six components
| Component | Weight | Definition |
|---|---|---|
opportunity_ratio | 0.35 | Candidate reusable input tokens divided by total input tokens |
recurrence_score | 0.20 | How often equivalent reusable prefix families recur |
retention_locality | 0.15 | Share of recurrence falling inside relevant cache-retention windows |
ttft_sensitivity | 0.15 | How much prefill latency matters to the customer experience or SLOs |
session_continuity | 0.10 | Share of traffic in multi-turn or branched sessions |
payload_redundancy | 0.05 | Repeated serialized bytes replaceable with reusable state references |
Opportunity dominates the score by design. If most of your input is genuinely reusable, that alone carries a third of the way to a strong score - but recurrence and retention locality decide whether that opportunity is reachable in practice. High opportunity with no recurrence is reuse you can never capture.
Interpretation bands
The score maps to a band, and each band has one recommended action.
| WRS | Band | Recommended action |
|---|---|---|
| 70-100 | Strong fit | Prioritize an optimization pilot |
| 45-69 | Plausible fit | Run a diagnostic and tune providers |
| 20-44 | Limited fit | Optimize prompt construction first |
| 0-19 | Weak fit | Do not sell BYOC or custom caching |
A high WRS means there is reuse worth capturing. It does not mean you should self-host. Even a strong-fit workload usually belongs on managed providers if provider-native caching already captures most of the available reuse. The decision to evaluate BYOC depends on a large missed gap (see reuse metrics), not on the score alone.
Deployment readiness is a separate score
Infrastructure feasibility is tracked independently, so a long-prompt workload never recommends BYOC by accident. Deployment readiness covers:
- provider capability coverage
- BYOK feasibility
- BYOC security approvals
- cloud and region constraints
- model stability
- expected traffic concentration
- engineering bandwidth
- acceptable operational burden
WRS asks "is there reuse to capture?" Deployment readiness asks "can this customer realistically operate a self-hosted profile?" Both must be favorable before a BYOC profile is on the table. Folding them together is the exact mistake the two-score split prevents.
Where the score comes from
The diagnostic derives the six components from observed metadata traces - no raw prompt text required - and the report defends every number. A strong score with high capture recommends provider tuning; a strong score with poor capture is what flags BYOC as worth evaluating.
Reuse metrics
Opportunity versus realized reuse, the metrics that define each, and the five evidence levels that keep a prediction from being mistaken for a measured fact.
Model aliases
Request a capability instead of a model string. Immutable alias releases, deterministic resolution, per-request resolution records, and no silent drift.