Zumik
Execution profiles

Managed providers

The default execution profile - Zumik's contracted provider accounts reached through the Bifrost gateway, with provider-native prompt caching, Batch APIs, and service tiers turned on for you.

The managed-provider profile is what every project runs on until it has a measured reason to do otherwise. Requests resolve to a model and execute against Zumik's contracted accounts across the five first-class providers, reached through the Bifrost gateway. You write no provider keys, you manage no infrastructure, and you still get every provider-native discount.

This is the path the Agent-Execution-Profile: managed_provider header reports.

The default path

Client

Bifrost gateway (Tier 1)        auth, quotas, global rate limits, failover, 23+ providers

Product API Core                alias resolution, project policy, the Execution Broker

Managed provider adapter

Company-managed provider account

The Core resolves the model through its immutable alias release, the Execution Broker selects this profile when there is no subscription or BYOK credential, and Bifrost holds the contracted keys and normalizes the provider behind an OpenAI-style surface. In development, with no gateway configured, the Core returns a deterministic placeholder instead of calling a provider, and the Agent-Execution-Mode header reads placeholder.

Bifrost is an upstream, not a hard requirement. With BIFROST_BASE_URL set, managed requests route through the gateway; otherwise the Core calls provider adapters directly. Either way the managed-provider profile means Zumik's accounts pay the provider, and the customer pays Zumik.

What you get for free

The managed profile turns on the provider-native economics without any client instrumentation. Each provider does this differently; the capability manifest records the exact facts the broker routes on.

ProviderPrompt cacheBatch APIService tiers
OpenAIAutomatic, 50% read discountYes, 50% off, 24hflex / default / scale
AnthropicExplicit, 90% read discountYes, 50% off, 24hstandard
xAICached context, 75% read discountNostandard
GeminiImplicit + explicit, 75% read discountYes, 50% off, 24hstandard
FireworksNoneAsync batchserverless / dedicated

Caching

The platform's bias is to capture provider-native caching before it considers anything heavier. Stable content at the front of the request, volatile content at the tail, and the discount lands on whichever provider answers. The per-provider mechanics are recorded in the capability manifest; Zumik reports realized reuse with an evidence level so a predicted discount is never reported as a measured one.

Batch and service tiers

Non-interactive work (background evaluations, diagnostic reprocessing, pre-computable tool calls, replay experiments) routes to a provider Batch API where one exists, for a 50% cost reduction at 24h turnaround. Interactive and QoS interactive requests never go to Batch. OpenAI's flex / default / scale tiers and Fireworks' serverless / dedicated tiers are matched to the request's QoS class so a latency-cost tradeoff is made deliberately, not by accident.

When the managed path is the right answer

You want to start

Fastest onboarding and the lowest operational burden. One base-URL change on /v1 and you are live against contracted accounts.

You need breadth

Broad model coverage across the five first-class providers plus the rest of Bifrost's 23+ backends, with automatic multi-provider failover.

Caching does the work

For most workloads provider-native caching captures the available reuse, which is exactly the condition under which BYOC loses on cost.

You have no procurement constraint

No existing provider agreement to honor and no account-level retention requirement that forces your own key.

When one of those conditions changes - you hold a provider agreement, you need account-level retention, or a few model paths concentrate hard enough to clear the replay bar - the next step is BYOK or, with proof, BYOC.

On this page