Zumik
Infrastructure

Multi-region & global infrastructure

Zumik's region model - the control-plane region registry, the GET /v2/regions map, data residency, and the Cloudflare load-balancer that geo-steers api.zumik.ai across regions with /healthz failover.

Zumik separates two ideas that both get called "region", and getting them straight is the whole mental model:

  • A residency zone (us, eu, global) is where customer data is processed. It's the vocabulary a project's regional policy speaks, and what inference routing enforces. A region_policy: eu_only project never routes to a provider outside the EU zone.
  • A control-plane region is where Zumik runs the API. Today one region is active - the OVH VPS in Beauharnois, Canada (ca-central) - with US-East and EU-West declared for build-out.

The control plane is region-aware in code and ships the infrastructure-as-code to run in several regions; the live footprint is a single active region. Planned regions are labelled as such everywhere - the platform never reports a region it hasn't stood up as available.

The region registry

The footprint is declared as data in crates/zumik-core/src/regions.rs and surfaced read-only:

CodeLocationResidency zoneDirect hostStatus
ca-centralCanada (Beauharnois)globalapi.zumik.aiactive
us-eastUnited States (East)usus.api.zumik.aiplanned
eu-westEuropean Union (West)eueu.api.zumik.aiplanned

The control plane runs under PIPEDA / Québec Law 25 in Canada. Managed-provider inference still executes in the provider's own zone (US today); residency for that is governed by the project's regional policy, not by where the control plane sits.

GET /v2/regions

One call returns the footprint, which region is serving the request, and the caller's residency policy - so a dashboard can show both where Zumik runs and where the project's traffic is allowed to process.

curl https://api.zumik.ai/v2/regions \
  -H "Authorization: Bearer $ZUMIK_API_KEY"
{
  "object": "list",
  "home_region": "ca-central",
  "residency_policy": { "allowed_zones": [], "unrestricted": true },
  "data": [
    {
      "code": "ca-central",
      "display_name": "Canada (Beauharnois)",
      "geography": "North America",
      "residency_zone": "global",
      "endpoint_host": "api.zumik.ai",
      "status": "active",
      "serving": true
    },
    { "code": "us-east", "status": "planned", "serving": false, "...": "..." },
    { "code": "eu-west", "status": "planned", "serving": false, "...": "..." }
  ]
}

residency_policy.allowed_zones mirrors exactly what inference enforcement gates on. An empty list means unrestricted; ["eu"] means a request that resolves to a non-EU provider is rejected with region_not_allowed (403).

Knowing which region served you

Every response carries the serving region, and /healthz echoes it, so failover is observable from the client and from any load-balancer health check:

$ curl -sI https://api.zumik.ai/v1/models | grep -i agent-control-region
agent-control-region: ca-central

$ curl -s https://api.zumik.ai/healthz
{"status":"ok","service":"api-core","region":"ca-central"}

/metrics exposes the same as an info gauge for dashboards and alerting:

zumik_control_plane_region_info{region="ca-central"} 1

Global infrastructure: the load balancer

infra/terraform/modules/cloudflare-lb is the activation layer. When enabled it provisions:

Health monitor

One HTTPS monitor probes every origin's /healthz and requires a 200. An origin that fails retries consecutive probes is drained - that's the failover trigger.

One pool per region

Each pool wraps a regional Zumik VPS origin. Pools carry the Cloudflare steering regions whose traffic should prefer them (e.g. WEU/EEU -> the EU pool).

Geo-steered load balancer

A zone-level LB on api.zumik.ai routes each client to the nearest healthy region (dynamic_latency) and fails over by default_pool_ids order, with a fallback_pool_id for the all-unhealthy case.

Per-region direct hosts

us.api.zumik.ai / eu.api.zumik.ai are stable direct addresses for residency-pinned traffic that must not be steered away from its zone.

It is gated behind multi_region.enable (default false), so a single-region deployment creates no load-balancer resources and the single-origin DNS keeps serving.

State model and active/passive

api-core persists its working set as a single JSON snapshot on the box (a single-writer store), so two regions cannot both accept writes to the same tenant data without the Postgres migration the storage layer is designed for. Until then the supported topology is active/passive: one region owns writes, the others are warm standbys that restore the latest R2 snapshot and take over on failover. Stateless inference fans out across regions freely; stateful /v2 writes pin to the primary region's hostname.

The full stand-up runbook - new box, state replication, token scopes, the api record cutover, and failover verification - is infra/deploy/MULTI_REGION.md.

On this page