Billing and budgets
The pricing tiers, prepaid credits, hard monthly caps, 50/80/100% alerts, per-API-key budgets, and opt-in overage - plus how to read your account and what each error means.
Zumik bills pay-as-you-go on prepaid credits, with hard budget controls so spend can never
surprise you. Costs are tracked in micros: 1_000_000 micros = $1.
Tiers
| Tier | Price | Includes |
|---|---|---|
| Pay-as-you-go credits | Prepaid, from $5 | Buy credits via Stripe; added to the balance in real time. Credits cover processed input and generated output tokens at published per-token rates. Unused credits stay on your balance. |
| Managed optimization pilot | $5,000-$20,000 / month + usage | Dedicated sales engineering and SLA-backed optimization reviews. |
| BYOC pilot / enterprise | Negotiated | Annual commitments via Stripe Invoicing. |
BYOK and BYOC traffic bills the provider spend (or your own infrastructure) directly; Zumik charges only a control-plane fee per request on those paths. The fee applies on every path.
Add credits
Inference requires a positive prepaid credit balance. With an empty balance, every inference call
returns 402 credits_required.
# Returns a PaymentIntent client secret for a $25 credit top-up. The console mounts the Stripe
# Payment Element with it, so the card is entered on Zumik and never leaves for a hosted page.
curl -X POST https://api.zumik.ai/v2/billing/payment-intent \
-H "Authorization: Bearer zk_live_..." \
-H "content-type: application/json" \
-d '{"amount_usd": 25}'
# => { "client_secret": "pi_..._secret_...", "publishable_key": "pk_live_..." }Use POST /v2/billing/portal to open the Stripe customer portal once you've made a top-up; it
manages your card and invoices.
Read your account
GET /v2/billing/account returns the live state:
{
"subscription_status": "none",
"credit_balance_micros": 4200000,
"monthly_budget_micros": 50000000,
"cycle_spend_micros": 12500000,
"overage_mode": "pause",
"plan": "base"
}subscription_status is none, active, past_due, or canceled. It no longer gates inference — a positive credit_balance_micros does. (It still reflects enterprise-contract state.)
overage_mode is pause (default) or allow.
Hard caps and alerts
POST /v2/billing/budget sets a hard monthly cap in dollars. Setting it also arms soft alerts at
50%, 80%, and 100% of the cap, delivered by email and webhook off Stripe billing events (not polling).
curl -X POST https://api.zumik.ai/v2/billing/budget \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-d '{"monthly_budget_usd": 50.0}'Pass null to remove the cap. A negative value returns 400 with param monthly_budget_usd.
When the cap is reached in pause mode, new inference returns 429 quota_exceeded - the same shape as
OpenAI's quota error, so existing retry/backoff treats it identically.
Overage
By default inference pauses at the cap. To keep serving past it - billed at standard pay-as-you-go
rates - opt in explicitly. confirm: true is required; enabling overage without it returns 400.
curl -X POST https://api.zumik.ai/v2/billing/overage \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-d '{"allow_overage": true, "confirm": true}'The opt-in is recorded in the audit log.
Per-API-key budgets
A single API key can carry its own limit, independent of the project cap - useful so one team
member's key can't drain a shared budget. POST /v2/api-keys/{key_id}/budget:
curl -X POST https://api.zumik.ai/v2/api-keys/zk_01jy…/budget \
-H "Authorization: Bearer zk_live_..." \
-H "Content-Type: application/json" \
-d '{"limit_usd": 25.0}'Pass null to clear the per-key limit. A key that exhausts its budget gets 429 quota_exceeded, even
if the project cap has room.
Where spend is reported
Every billed generation is a usage event with a charged_micros field. Roll it up on
/v2/usage, optionally grouped by provider, model, profile, region, or
day. The free workload diagnostic is metered separately and does not
require a payment method.
Reduce the bill with reuse
Cached input bills at the read rate. Order your prompt so the providers' caches actually hit.
QoS outcomes
Declare a QoS class and targets, read the formal qos_outcome with its admission, completion, and reason codes, and carry the same intent on OpenAI-compatible traffic through response headers and Agent Hints.
Purge semantics
The difference between delete and purge, the five guarantee classes, the signed receipt, namespace-generation invalidation, and why a purged artifact can never be resurrected from a stale cache.