Bifrost gateway

The optional Tier 1 gateway in front of the Product API Core - auth, quotas, layered rate limits, automatic failover across 23+ providers, the OpenRouter emergency path, and where each limit is enforced.

Bifrost is the optional Tier 1 AI gateway that can sit in front of the Product API Core. It is the centralized entry point for provider execution: it holds the platform's provider keys, normalizes many providers behind one OpenAI-style interface, and handles failover and load balancing so the Core does not carry provider plumbing. It is a deploy and architecture component, not a customer-facing API.

Bifrost is an upstream, not a requirement. The Core runs standalone. When BIFROST_BASE_URL is set, managed-provider requests are sent OpenAI-style to the gateway; otherwise the Core talks to provider adapters directly, or in dev returns a deterministic placeholder. Do not substitute LiteLLM or other middleware - Bifrost is the designated high-performance gateway for all managed and BYOK provider traffic.

Who owns what

The split is deliberate: the gateway owns transport and provider access; the Core owns everything that is Zumik-specific.

Bifrost (Tier 1)	Product API Core (behind it)
TLS termination, API-key auth, JWT, request ids	`/v1` and `/v2` surfaces, project policy
Global rate limits and quotas	Model alias resolution and immutable releases
Load balancing and automatic provider failover	Session state, branches, snapshots
Semantic caching	Diagnostics, replay, purge evidence
Access to 23+ providers behind one interface	QoS admission, billing, the usage meter

In production the Core also sits behind nginx + Cloudflare, which terminate TLS and apply edge and per-IP rate limits. Bifrost is the provider-facing layer, not the public edge.

Configuration

The gateway holds the five first-class provider keys from the environment and load-balances by weight with automatic failover. A minimal config:

infra/bifrost/config.json

{
  "providers": {
    "openai":    { "keys": [{ "value": "env.OPENAI_API_KEY",    "weight": 1 }] },
    "anthropic": { "keys": [{ "value": "env.ANTHROPIC_API_KEY", "weight": 1 }] },
    "xai":       { "keys": [{ "value": "env.XAI_API_KEY",       "weight": 1 }] },
    "gemini":    { "keys": [{ "value": "env.GEMINI_API_KEY",    "weight": 1 }] },
    "fireworks": { "keys": [{ "value": "env.FIREWORKS_API_KEY", "weight": 1 }] }
  },
  "routing": { "automatic_failover": true, "load_balancing": "weighted" },
  "semantic_cache": { "enabled": true, "ttl_seconds": 300 },
  "rate_limits": { "global_requests_per_minute": 6000 }
}

Keys are referenced as env.*, never inlined - the values come from the environment or a secret store (the Terraform provider-secrets module renders them out of band). The five first-class providers configured here are the cost- and speed-optimized adapters; Bifrost reaches the rest of its 23+ backends through the same interface.

Providers and execution profiles

The managed-provider profile routes through Bifrost to the platform's contracted accounts across the five primary providers - OpenAI, Anthropic, xAI, Google Gemini, and Fireworks AI - plus broad coverage of others. This is the default profile: fastest onboarding, lowest operational burden, full access to provider-native prompt caching and Batch APIs.

The BYOK profile bypasses the platform keys: the Execution Broker calls the resolved provider with the customer's own sealed credential. The BYOC profile routes concentrated, reusable workloads to self-hosted clusters. Which profile served a request comes back on Agent-Execution-Profile.

Failover and the emergency fallback

Bifrost does automatic multi-provider failover within the managed path. Beyond that, Zumik has one last-resort continuity layer: the OpenRouter emergency fallback. It is intentionally narrow - it fires only after a verified primary failure for a required model path, never for price arbitration, is gated behind explicit policy, and is audited on every use.

The execution mode is reported on Agent-Execution-Mode: live (primary gateway), openrouter_fallback, or placeholder (no gateway configured).

Rate limiting

Limits are layered; no single choke point is trusted:

Cloudflare edge

Per-IP thresholds drop or challenge abusive traffic before it reaches the origin, with stricter limits on auth and inference endpoints. See Terraform for the WAF rules.

Bifrost

Per-API-key request-per-minute and token-per-minute limits, plus the global RPM cap, on inference endpoints. Returns 429 with Retry-After when exceeded.

API Core

Per-key request-rate limiting and per-project budgets. An exceeded rate returns 429 rate_limit_exceeded, distinct from a budget 429 quota_exceeded.

Read endpoints (GET /v1/models, GET /v2/artifacts/{id}) carry higher limits than the inference endpoints. See troubleshooting for the rate-limit and quota error codes.