Zumik
Execution profiles

Execution profiles

The four ways Zumik runs a request - managed providers by default, BYOK, BYOC, and hybrid - plus the OpenRouter emergency fallback, and the rule that BYOC is only ever activated when replay proves it.

Every request resolves to a model and then runs through one of four execution profiles. The profile decides whose provider account is billed, whose GPUs serve the tokens, and how strong a purge guarantee Zumik can make. The order below is the order of escalation: start managed, add the next profile only when there is a concrete reason to.

The profile that actually served a request comes back on the Agent-Execution-Profile response header (managed_provider, byok, subscription, byoc_dynamo, or byoc_epp), so the routing decision is never a black box.

Pick the cheapest reliable path

The platform's whole bias is to exhaust provider-native economics before it touches infrastructure. For most workloads the 90% Anthropic cache-read discount, Gemini implicit caching, or Fireworks open-source routing closes the cost gap long before self-hosting would. The escalation only makes sense once that ceiling is hit.

ProfileProvider accountData planeWhen to reach for it
ManagedZumikProvider cloudDefault. Broad coverage, no ops, every provider-native discount.
BYOKCustomerProvider cloudExisting provider agreements, account-level retention, customer-controlled billing.
BYOCCustomerCustomer GPUsReplay proves dedicated SLOs, hot-model volume, private networking, or stronger purge evidence.
HybridBothBothA few dominant model paths plus everything else.

The broker picks the profile once per request: a customer subscription wins first (the bundled allowance is the cheapest), then a BYOK credential, otherwise the managed-provider path. See execution profiles for the concept-level model.

BYOC is replay-justified, not volume-justified

Zumik does not recommend or activate BYOC on prefix length, raw request volume, or a gut feeling that self-hosting is cheaper. It activates BYOC only when replay proves that the blended total cost - infrastructure, operations, the platform fee, and engineering burden - beats the best managed-provider path for the same workload at the same reliability level.

The entry point is a workload diagnostic returning byoc_pilot_worth_evaluating; the gate is a replay run over recorded traffic. The most common outcome of that analysis is that managed-provider caching already captures the available reuse, and the diagnostic says so plainly.

Running BYOC also requires GPU hardware in your own cloud. The BYOC stack and portable Kubernetes pages are deploy and architecture docs, not a turnkey hosted runtime.

OpenRouter is the emergency exit, not a profile

Beyond these four profiles there is one last-resort continuity layer: OpenRouter emergency fallback. It fires only after a verified primary failure for a required model path, never for price arbitration or routine routing, it is gated behind explicit policy, and every use is written to the audit log. It is deliberately not in the table above because it is not a way to run normal traffic.

A project whose regional policy sets a strict or no_openrouter data boundary never uses the fallback, and BYOK traffic never falls back to OpenRouter - your key is your explicit choice. When fallback is not permitted, Zumik returns a clear degraded response rather than silently crossing a policy boundary.

On this page