Zumik
Pricing

Reuse credit

How realized reuse becomes a visible credit on your bill, gross input charge minus reuse credit equals processed input charge.

When Zumik confirms that input was reused, the savings show up on your invoice as a line you can read. The reuse credit makes captured reuse visible instead of burying it in an opaque blended rate.

The formula

The input side of every billed generation works out to:

gross_input_charge - reuse_credit = processed_input_charge

You are billed the processed input charge. The gross charge is what the input would have cost with no reuse; the reuse credit is what you saved because part of the input was served from a cache.

How the credit is computed

Internally:

reuse_credit = realized_reused_tokens × profile_specific_discount

Two things matter here.

  • It is realized reused tokens, not candidate or eligible tokens. A reusable handle is not a cache hit, so only reuse the provider or runtime actually confirmed earns a credit. See reuse metrics for the opportunity-versus-realized distinction and the evidence levels attached to each number.
  • The discount is profile-specific. Different providers expose different cache economics, so the rate that turns reused tokens into a credit depends on the execution profile and provider that served the request.

Why the input side only

Output tokens are always generated fresh, so they are never discounted. Reuse applies to the prefix you resend, the system instructions, tools, schemas, and stable context, which is exactly what provider prompt caching serves from cache. Ordering your prompt so that stable content comes first is what lets the providers' caches hit in the first place.

Order prompts for cache hits

The reuse credit only appears when reuse is realized. Prompt layout is how you make that happen.

Where it appears

The reuse credit is a customer-facing line on the managed-provider invoice, alongside processed input tokens, generated output tokens, and the subscription. The detailed per-request reuse telemetry stays in the dashboards and on /v2/usage; it is not split into separate billing meters.

On /v1, the discounted input fraction is also reflected in the standard usage.prompt_tokens_details.cached_tokens field, so a vanilla OpenAI SDK can read the reuse without anything proprietary in the response body. See OpenAI compatibility.

Under BYOK the provider bills you directly, so cache discounts appear on your own provider account; Zumik still surfaces the realized-reuse telemetry so you can verify capture.

On this page