Workload Reuse Score

The WRS formula, its six weighted components, the interpretation bands, and the deliberately separate deployment-readiness score.

The Workload Reuse Score (WRS) is a calibrated 0-to-100 number that answers one question: how much reuse does this workload actually have to capture? It replaces the old "median prompt above 8k tokens" heuristic, which conflated prompt length with reuse value and led teams to self-host when they did not need to.

The formula

WRS is a weighted sum of six normalized components, each in the range 0.0 to 1.0, scaled to 0-100:

WRS = 100 × (
    0.35 × opportunity_ratio
  + 0.20 × recurrence_score
  + 0.15 × retention_locality
  + 0.15 × ttft_sensitivity
  + 0.10 × session_continuity
  + 0.05 × payload_redundancy
)

Out-of-range component values are clamped to [0, 1] before weighting, so a noisy estimate can never push the score past 100 or below 0.

The six components

Component	Weight	Definition
`opportunity_ratio`	0.35	Candidate reusable input tokens divided by total input tokens
`recurrence_score`	0.20	How often equivalent reusable prefix families recur
`retention_locality`	0.15	Share of recurrence falling inside relevant cache-retention windows
`ttft_sensitivity`	0.15	How much prefill latency matters to the customer experience or SLOs
`session_continuity`	0.10	Share of traffic in multi-turn or branched sessions
`payload_redundancy`	0.05	Repeated serialized bytes replaceable with reusable state references

Opportunity dominates the score by design. If most of your input is genuinely reusable, that alone carries a third of the way to a strong score - but recurrence and retention locality decide whether that opportunity is reachable in practice. High opportunity with no recurrence is reuse you can never capture.

Interpretation bands

The score maps to a band, and each band has one recommended action.

WRS	Band	Recommended action
70-100	Strong fit	Prioritize an optimization pilot
45-69	Plausible fit	Run a diagnostic and tune providers
20-44	Limited fit	Optimize prompt construction first
0-19	Weak fit	Do not sell BYOC or custom caching

A high WRS means there is reuse worth capturing. It does not mean you should self-host. Even a strong-fit workload usually belongs on managed providers if provider-native caching already captures most of the available reuse. The decision to evaluate BYOC depends on a large missed gap (see reuse metrics), not on the score alone.

Deployment readiness is a separate score

Infrastructure feasibility is tracked independently, so a long-prompt workload never recommends BYOC by accident. Deployment readiness covers:

provider capability coverage
BYOK feasibility
BYOC security approvals
cloud and region constraints
model stability
expected traffic concentration
engineering bandwidth
acceptable operational burden

WRS asks "is there reuse to capture?" Deployment readiness asks "can this customer realistically operate a self-hosted profile?" Both must be favorable before a BYOC profile is on the table. Folding them together is the exact mistake the two-score split prevents.

Where the score comes from

The diagnostic derives the six components from observed metadata traces - no raw prompt text required - and the report defends every number. A strong score with high capture recommends provider tuning; a strong score with poor capture is what flags BYOC as worth evaluating.

Reuse metrics

The opportunity and capture numbers behind the components.

Workload diagnostics

Run a scored diagnostic on your own traffic.