Multi-region & global infrastructure
Zumik's region model - the control-plane region registry, the GET /v2/regions map, data residency, and the Cloudflare load-balancer that geo-steers api.zumik.ai across regions with /healthz failover.
Zumik separates two ideas that both get called "region", and getting them straight is the whole mental model:
- A residency zone (
us,eu,global) is where customer data is processed. It's the vocabulary a project's regional policy speaks, and what inference routing enforces. Aregion_policy: eu_onlyproject never routes to a provider outside the EU zone. - A control-plane region is where Zumik runs the API. Today one region is active - the OVH
VPS in Beauharnois, Canada (
ca-central) - with US-East and EU-West declared for build-out.
The control plane is region-aware in code and ships the infrastructure-as-code to run in several regions; the live footprint is a single active region. Planned regions are labelled as such everywhere - the platform never reports a region it hasn't stood up as available.
The region registry
The footprint is declared as data in crates/zumik-core/src/regions.rs and surfaced read-only:
| Code | Location | Residency zone | Direct host | Status |
|---|---|---|---|---|
ca-central | Canada (Beauharnois) | global | api.zumik.ai | active |
us-east | United States (East) | us | us.api.zumik.ai | planned |
eu-west | European Union (West) | eu | eu.api.zumik.ai | planned |
The control plane runs under PIPEDA / Québec Law 25 in Canada. Managed-provider inference still executes in the provider's own zone (US today); residency for that is governed by the project's regional policy, not by where the control plane sits.
GET /v2/regions
One call returns the footprint, which region is serving the request, and the caller's residency policy - so a dashboard can show both where Zumik runs and where the project's traffic is allowed to process.
curl https://api.zumik.ai/v2/regions \
-H "Authorization: Bearer $ZUMIK_API_KEY"{
"object": "list",
"home_region": "ca-central",
"residency_policy": { "allowed_zones": [], "unrestricted": true },
"data": [
{
"code": "ca-central",
"display_name": "Canada (Beauharnois)",
"geography": "North America",
"residency_zone": "global",
"endpoint_host": "api.zumik.ai",
"status": "active",
"serving": true
},
{ "code": "us-east", "status": "planned", "serving": false, "...": "..." },
{ "code": "eu-west", "status": "planned", "serving": false, "...": "..." }
]
}residency_policy.allowed_zones mirrors exactly what inference enforcement gates on. An empty list
means unrestricted; ["eu"] means a request that resolves to a non-EU provider is rejected with
region_not_allowed (403).
Knowing which region served you
Every response carries the serving region, and /healthz echoes it, so failover is observable from
the client and from any load-balancer health check:
$ curl -sI https://api.zumik.ai/v1/models | grep -i agent-control-region
agent-control-region: ca-central
$ curl -s https://api.zumik.ai/healthz
{"status":"ok","service":"api-core","region":"ca-central"}/metrics exposes the same as an info gauge for dashboards and alerting:
zumik_control_plane_region_info{region="ca-central"} 1Global infrastructure: the load balancer
infra/terraform/modules/cloudflare-lb is the activation layer. When enabled it provisions:
Health monitor
One HTTPS monitor probes every origin's /healthz and requires a 200. An origin that fails
retries consecutive probes is drained - that's the failover trigger.
One pool per region
Each pool wraps a regional Zumik VPS origin. Pools carry the Cloudflare steering regions whose
traffic should prefer them (e.g. WEU/EEU -> the EU pool).
Geo-steered load balancer
A zone-level LB on api.zumik.ai routes each client to the nearest healthy region
(dynamic_latency) and fails over by default_pool_ids order, with a fallback_pool_id for the
all-unhealthy case.
Per-region direct hosts
us.api.zumik.ai / eu.api.zumik.ai are stable direct addresses for residency-pinned traffic
that must not be steered away from its zone.
It is gated behind multi_region.enable (default false), so a single-region deployment creates no
load-balancer resources and the single-origin DNS keeps serving.
State model and active/passive
api-core persists its working set as a single JSON snapshot on the box (a single-writer store),
so two regions cannot both accept writes to the same tenant data without the Postgres migration the
storage layer is designed for. Until then the supported topology is active/passive: one region
owns writes, the others are warm standbys that restore the latest R2 snapshot and take over on
failover. Stateless inference fans out across regions freely; stateful /v2 writes pin to the
primary region's hostname.
The full stand-up runbook - new box, state replication, token scopes, the api record cutover, and
failover verification - is infra/deploy/MULTI_REGION.md.
Terraform
The infra/terraform modules - Cloudflare DNS, WAF rate limiting, and zone TLS settings, plus a provider-secrets renderer - with init/plan/apply, remote state on R2, and how sensitive variables stay out of plan output.
Observability
The /metrics Prometheus surface api-core exposes, the two Grafana dashboards in infra/observability, and what a registered BYOC cluster's collector ships - one for the managed API today, one for GPU clusters when they exist.