Execution profiles
The four ways Zumik runs a request - managed providers by default, BYOK, BYOC, and hybrid - plus the OpenRouter emergency fallback, and the rule that BYOC is only ever activated when replay proves it.
Every request resolves to a model and then runs through one of four execution profiles. The profile decides whose provider account is billed, whose GPUs serve the tokens, and how strong a purge guarantee Zumik can make. The order below is the order of escalation: start managed, add the next profile only when there is a concrete reason to.
The profile that actually served a request comes back on the Agent-Execution-Profile response
header (managed_provider, byok, subscription, byoc_dynamo, or byoc_epp), so the routing
decision is never a black box.
Managed providers (default)
Zumik's contracted accounts across OpenAI, Anthropic, xAI, Gemini, and Fireworks, reached through the Bifrost gateway. Fastest onboarding, lowest operational burden, full access to provider-native caching, Batch APIs, and service tiers.
BYOK
Bring your own provider key. Zumik calls the resolved provider with your sealed credential and you keep the billing relationship. Inherits every provider-native optimization the managed path has.
BYOC
Self-host the inference data plane in your own cloud while Zumik's control plane keeps owning policy, resolution, and purge evidence. Replay-gated: you only turn it on when it wins.
Hybrid
Managed providers for broad coverage and overflow, with BYOC hot lanes carrying the few model paths concentrated enough to justify dedicated infrastructure.
Pick the cheapest reliable path
The platform's whole bias is to exhaust provider-native economics before it touches infrastructure. For most workloads the 90% Anthropic cache-read discount, Gemini implicit caching, or Fireworks open-source routing closes the cost gap long before self-hosting would. The escalation only makes sense once that ceiling is hit.
| Profile | Provider account | Data plane | When to reach for it |
|---|---|---|---|
| Managed | Zumik | Provider cloud | Default. Broad coverage, no ops, every provider-native discount. |
| BYOK | Customer | Provider cloud | Existing provider agreements, account-level retention, customer-controlled billing. |
| BYOC | Customer | Customer GPUs | Replay proves dedicated SLOs, hot-model volume, private networking, or stronger purge evidence. |
| Hybrid | Both | Both | A few dominant model paths plus everything else. |
The broker picks the profile once per request: a customer subscription wins first (the bundled allowance is the cheapest), then a BYOK credential, otherwise the managed-provider path. See execution profiles for the concept-level model.
BYOC is replay-justified, not volume-justified
Zumik does not recommend or activate BYOC on prefix length, raw request volume, or a gut feeling that self-hosting is cheaper. It activates BYOC only when replay proves that the blended total cost - infrastructure, operations, the platform fee, and engineering burden - beats the best managed-provider path for the same workload at the same reliability level.
The entry point is a workload diagnostic returning
byoc_pilot_worth_evaluating; the gate is a replay run over recorded traffic. The most common
outcome of that analysis is that managed-provider caching already captures the available reuse, and
the diagnostic says so plainly.
Running BYOC also requires GPU hardware in your own cloud. The BYOC stack and portable Kubernetes pages are deploy and architecture docs, not a turnkey hosted runtime.
OpenRouter is the emergency exit, not a profile
Beyond these four profiles there is one last-resort continuity layer: OpenRouter emergency fallback. It fires only after a verified primary failure for a required model path, never for price arbitration or routine routing, it is gated behind explicit policy, and every use is written to the audit log. It is deliberately not in the table above because it is not a way to run normal traffic.
A project whose regional policy sets a strict or no_openrouter data
boundary never uses the fallback, and BYOK traffic never falls back to OpenRouter - your key is your
explicit choice. When fallback is not permitted, Zumik returns a clear degraded response rather than
silently crossing a policy boundary.
Agent runtime
A multi-turn agent loop on Zumik's native /v2 session and branch surface - an append-only transcript with optimistic concurrency and interactive QoS.
Managed providers
The default execution profile - Zumik's contracted provider accounts reached through the Bifrost gateway, with provider-native prompt caching, Batch APIs, and service tiers turned on for you.