Zumik
Infrastructure

Bifrost gateway

The optional Tier 1 gateway in front of the Product API Core - auth, quotas, layered rate limits, automatic failover across 23+ providers, the OpenRouter emergency path, and where each limit is enforced.

Bifrost is the optional Tier 1 AI gateway that can sit in front of the Product API Core. It is the centralized entry point for provider execution: it holds the platform's provider keys, normalizes many providers behind one OpenAI-style interface, and handles failover and load balancing so the Core does not carry provider plumbing. It is a deploy and architecture component, not a customer-facing API.

Bifrost is an upstream, not a requirement. The Core runs standalone. When BIFROST_BASE_URL is set, managed-provider requests are sent OpenAI-style to the gateway; otherwise the Core talks to provider adapters directly, or in dev returns a deterministic placeholder. Do not substitute LiteLLM or other middleware - Bifrost is the designated high-performance gateway for all managed and BYOK provider traffic.

Who owns what

The split is deliberate: the gateway owns transport and provider access; the Core owns everything that is Zumik-specific.

Bifrost (Tier 1)Product API Core (behind it)
TLS termination, API-key auth, JWT, request ids/v1 and /v2 surfaces, project policy
Global rate limits and quotasModel alias resolution and immutable releases
Load balancing and automatic provider failoverSession state, branches, snapshots
Semantic cachingDiagnostics, replay, purge evidence
Access to 23+ providers behind one interfaceQoS admission, billing, the usage meter

In production the Core also sits behind nginx + Cloudflare, which terminate TLS and apply edge and per-IP rate limits. Bifrost is the provider-facing layer, not the public edge.

Configuration

The gateway holds the five first-class provider keys from the environment and load-balances by weight with automatic failover. A minimal config:

infra/bifrost/config.json
{
  "providers": {
    "openai":    { "keys": [{ "value": "env.OPENAI_API_KEY",    "weight": 1 }] },
    "anthropic": { "keys": [{ "value": "env.ANTHROPIC_API_KEY", "weight": 1 }] },
    "xai":       { "keys": [{ "value": "env.XAI_API_KEY",       "weight": 1 }] },
    "gemini":    { "keys": [{ "value": "env.GEMINI_API_KEY",    "weight": 1 }] },
    "fireworks": { "keys": [{ "value": "env.FIREWORKS_API_KEY", "weight": 1 }] }
  },
  "routing": { "automatic_failover": true, "load_balancing": "weighted" },
  "semantic_cache": { "enabled": true, "ttl_seconds": 300 },
  "rate_limits": { "global_requests_per_minute": 6000 }
}

Keys are referenced as env.*, never inlined - the values come from the environment or a secret store (the Terraform provider-secrets module renders them out of band). The five first-class providers configured here are the cost- and speed-optimized adapters; Bifrost reaches the rest of its 23+ backends through the same interface.

Providers and execution profiles

The managed-provider profile routes through Bifrost to the platform's contracted accounts across the five primary providers - OpenAI, Anthropic, xAI, Google Gemini, and Fireworks AI - plus broad coverage of others. This is the default profile: fastest onboarding, lowest operational burden, full access to provider-native prompt caching and Batch APIs.

The BYOK profile bypasses the platform keys: the Execution Broker calls the resolved provider with the customer's own sealed credential. The BYOC profile routes concentrated, reusable workloads to self-hosted clusters. Which profile served a request comes back on Agent-Execution-Profile.

Failover and the emergency fallback

Bifrost does automatic multi-provider failover within the managed path. Beyond that, Zumik has one last-resort continuity layer: the OpenRouter emergency fallback. It is intentionally narrow - it fires only after a verified primary failure for a required model path, never for price arbitration, is gated behind explicit policy, and is audited on every use.

The execution mode is reported on Agent-Execution-Mode: live (primary gateway), openrouter_fallback, or placeholder (no gateway configured).

Rate limiting

Limits are layered; no single choke point is trusted:

Cloudflare edge

Per-IP thresholds drop or challenge abusive traffic before it reaches the origin, with stricter limits on auth and inference endpoints. See Terraform for the WAF rules.

Bifrost

Per-API-key request-per-minute and token-per-minute limits, plus the global RPM cap, on inference endpoints. Returns 429 with Retry-After when exceeded.

API Core

Per-key request-rate limiting and per-project budgets. An exceeded rate returns 429 rate_limit_exceeded, distinct from a budget 429 quota_exceeded.

Read endpoints (GET /v1/models, GET /v2/artifacts/{id}) carry higher limits than the inference endpoints. See troubleshooting for the rate-limit and quota error codes.

On this page