BYOK profile
Bring your own provider key for any of the five first-class providers. Zumik calls the resolved provider with your sealed credential, you keep the billing relationship, and you inherit every provider-native optimization.
BYOK (bring your own key) bypasses Zumik's contracted accounts. The Execution Broker calls the resolved provider with the customer's own sealed credential, the provider bills the customer directly, and Zumik charges its platform fee on top. Everything else - alias resolution, project policy, session state, diagnostics, purge evidence - stays exactly as it is on the managed path.
It is first-class for all five primary providers: OpenAI, Anthropic, xAI, Google Gemini, and
Fireworks AI. The profile that served the request comes back as Agent-Execution-Profile: byok, and
the credential used rides on Agent-Byok-Credential-Id.
The path
Client
↓
Product API Core alias resolution, project policy, the Execution Broker
↓
Provider adapter called with the customer's decrypted key
↓
Customer-owned provider accountThe broker selects BYOK when the project has a stored credential for the resolved provider and no subscription route applies. The decrypted key is held in zeroizing memory for the call and wiped when the credential drops - it is never logged and never returned. See BYOK setup and the provider-credentials API for sealing and rotating keys.
What it inherits
BYOK keeps the full provider-native cost and speed surface. The work runs under your key, but the broker still routes for the same optimizations the managed path uses:
- Anthropic
cache_controlbreakpoints and the 90% cache-read discount. - Gemini implicit caching and the Context Caching API.
- Fireworks dedicated-tier latency and speculative decoding.
- OpenAI and Anthropic Batch APIs for non-interactive work.
Provider-native caching works identically - the discount shows up in cached_tokens on /v1 and in
the full reuse waterfall on /v2/usage, because it is the same provider mechanism, just billed to your
account. The per-provider facts live in the capability manifest.
Caching under your own key is itself a reason some customers choose BYOK: the cache lives in the account they control, which can matter for account-level retention and compliance requirements.
Billing relationship
This is the practical difference from the managed profile. The provider invoices the customer for tokens; Zumik invoices the customer for the control plane (resolution, state, diagnostics, purge) plus its platform fee. There is no Zumik markup on the provider tokens themselves. The plans page covers how the platform fee is structured.
BYOK never falls back to OpenRouter. The customer's key is an explicit choice, so if the provider call fails the request degrades to a clear error rather than being silently brokered through a third party. This is the one behavioral difference from the managed path's emergency fallback.
When to use BYOK
Existing provider agreements
You already have contracted rates, committed-use discounts, or an enterprise agreement with a provider and want to run on them.
Account-level retention
Your provider account carries a private retention or zero-retention policy that has to apply to the inference calls.
Quota and rate reservations
Customer-specific rate limits or reserved quota that live on your provider account, not Zumik's.
Procurement and billing control
Procurement constraints or a billing relationship you need to own directly with the provider.
If a few model paths concentrate hard enough that dedicated infrastructure would beat both managed and BYOK on blended cost, the next escalation is BYOC - but only after replay proves it.
Managed providers
The default execution profile - Zumik's contracted provider accounts reached through the Bifrost gateway, with provider-native prompt caching, Batch APIs, and service tiers turned on for you.
BYOC profile
Self-host the inference data plane in your own cloud while Zumik keeps owning policy, resolution, and purge evidence. When and why to do it, replay-gated activation, and the control-plane ownership split. Running it needs your own GPUs.