Zumik
Guides

Billing and budgets

The pricing tiers, prepaid credits, hard monthly caps, 50/80/100% alerts, per-API-key budgets, and opt-in overage - plus how to read your account and what each error means.

Zumik bills pay-as-you-go on prepaid credits, with hard budget controls so spend can never surprise you. Costs are tracked in micros: 1_000_000 micros = $1.

Tiers

TierPriceIncludes
Pay-as-you-go creditsPrepaid, from $5Buy credits via Stripe; added to the balance in real time. Credits cover processed input and generated output tokens at published per-token rates. Unused credits stay on your balance.
Managed optimization pilot$5,000-$20,000 / month + usageDedicated sales engineering and SLA-backed optimization reviews.
BYOC pilot / enterpriseNegotiatedAnnual commitments via Stripe Invoicing.

BYOK and BYOC traffic bills the provider spend (or your own infrastructure) directly; Zumik charges only a control-plane fee per request on those paths. The fee applies on every path.

Add credits

Inference requires a positive prepaid credit balance. With an empty balance, every inference call returns 402 credits_required.

# Returns a PaymentIntent client secret for a $25 credit top-up. The console mounts the Stripe
# Payment Element with it, so the card is entered on Zumik and never leaves for a hosted page.
curl -X POST https://api.zumik.ai/v2/billing/payment-intent \
  -H "Authorization: Bearer zk_live_..." \
  -H "content-type: application/json" \
  -d '{"amount_usd": 25}'
# => { "client_secret": "pi_..._secret_...", "publishable_key": "pk_live_..." }

Use POST /v2/billing/portal to open the Stripe customer portal once you've made a top-up; it manages your card and invoices.

Read your account

GET /v2/billing/account returns the live state:

{
  "subscription_status": "none",
  "credit_balance_micros": 4200000,
  "monthly_budget_micros": 50000000,
  "cycle_spend_micros": 12500000,
  "overage_mode": "pause",
  "plan": "base"
}

subscription_status is none, active, past_due, or canceled. It no longer gates inference — a positive credit_balance_micros does. (It still reflects enterprise-contract state.) overage_mode is pause (default) or allow.

Hard caps and alerts

POST /v2/billing/budget sets a hard monthly cap in dollars. Setting it also arms soft alerts at 50%, 80%, and 100% of the cap, delivered by email and webhook off Stripe billing events (not polling).

curl -X POST https://api.zumik.ai/v2/billing/budget \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"monthly_budget_usd": 50.0}'

Pass null to remove the cap. A negative value returns 400 with param monthly_budget_usd.

When the cap is reached in pause mode, new inference returns 429 quota_exceeded - the same shape as OpenAI's quota error, so existing retry/backoff treats it identically.

Overage

By default inference pauses at the cap. To keep serving past it - billed at standard pay-as-you-go rates - opt in explicitly. confirm: true is required; enabling overage without it returns 400.

curl -X POST https://api.zumik.ai/v2/billing/overage \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"allow_overage": true, "confirm": true}'

The opt-in is recorded in the audit log.

Per-API-key budgets

A single API key can carry its own limit, independent of the project cap - useful so one team member's key can't drain a shared budget. POST /v2/api-keys/{key_id}/budget:

curl -X POST https://api.zumik.ai/v2/api-keys/zk_01jy…/budget \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"limit_usd": 25.0}'

Pass null to clear the per-key limit. A key that exhausts its budget gets 429 quota_exceeded, even if the project cap has room.

Where spend is reported

Every billed generation is a usage event with a charged_micros field. Roll it up on /v2/usage, optionally grouped by provider, model, profile, region, or day. The free workload diagnostic is metered separately and does not require a payment method.

Reduce the bill with reuse

Cached input bills at the read rate. Order your prompt so the providers' caches actually hit.

On this page