Terraform
The infra/terraform modules - Cloudflare DNS, WAF rate limiting, and zone TLS settings, plus a provider-secrets renderer - with init/plan/apply, remote state on R2, and how sensitive variables stay out of plan output.
The infra/terraform modules manage the edge and secret surface of zumik.ai: the Cloudflare zone
that fronts the OVH origin VPS, and a renderer that turns the managed-provider and billing secrets into
a deliverable .env-shaped file. It deliberately does not provision the VPS (that stays a
documented runbook on a single box) and does not manage the GPU Kubernetes profiles - those are the
BYOC stack and portable Kubernetes
Helm charts.
Layout
terraform/
├── versions.tf provider + Terraform version pins, remote (R2/S3) backend stub
├── providers.tf Cloudflare provider (token from a sensitive var)
├── variables.tf all root inputs; every secret marked sensitive, no defaults
├── main.tf zone lookup + module wiring
├── outputs.tf
├── terraform.tfvars.example
└── modules/
├── cloudflare/ DNS records, WAF rate-limit ruleset, zone SSL/TLS settings
└── provider-secrets/ renders the provider + Stripe + ZUMIK_SECRET_KEY env templateWhat it manages
Cloudflare (modules/cloudflare)
DNS
Proxied A (and optional AAAA) records for the apex, www, api, console, and auth pointing at
the origin VPS; docs either delegated or on the origin. status is left alone by default - the
uptime worker owns that hostname.
WAF rate limiting
Per-IP edge caps in the http_ratelimit phase, stricter on auth and inference than on reads. This
is the outermost layer only; per-key and per-project limits live deeper at
Bifrost and api-core.
Zone TLS settings
SSL Full (strict), Always Use HTTPS, HSTS (1y, preload, subdomains), a TLS 1.2 floor with 1.3, Brotli, HTTP/3, Early Hints, and Rocket Loader off (it breaks Next.js hydration).
Defense in depth
A server-to-server API client cannot solve a challenge, so the inference burst cap always blocks rather than challenges. Reads carry looser limits.
The WAF rules implement the layered rate-limit policy: a failed-login brute-force cap (5/IP/10 min, counting only 401/403/429), a signup and password-reset cap (3/IP/hour), an always-block inference burst cap, and a looser read-endpoint cap.
Secrets (modules/provider-secrets)
This module renders the five managed-provider keys, the Stripe trio, and ZUMIK_SECRET_KEY into a
.env-shaped file via local_sensitive_file, so values never surface in plan or apply output. All
inputs are sensitive variables with no defaults - nothing is hardcoded. The rendered file is
gitignored; deliver it to the host or a secret store out of band.
This is a convenience renderer, not a secret store. For production, prefer Cloudflare Secrets or a
KMS and feed TF_VAR_* from there. Because state carries the rendered-secret content hash, the
backend bucket must itself be access-controlled and encrypted, and you must never run with local
state in CI.
Usage
Secrets come from the environment, never a committed file:
cd infra/terraform
export TF_VAR_cloudflare_api_token='cf-token-with-zone+dns+waf-edit'
export TF_VAR_zumik_secret_key="$(openssl rand -hex 32)"
export TF_VAR_provider_keys='{openai="sk-...",anthropic="sk-ant-...",gemini="...",xai="...",fireworks="..."}'
export TF_VAR_stripe='{secret_key="sk_live_...",webhook_secret="whsec_...",price_base_monthly="price_..."}'
cp terraform.tfvars.example terraform.tfvars # fill the NON-secret values (account id, origin ip)Validate locally without a backend
terraform init -backend=false
terraform fmt -check
terraform validateInit against remote state
The s3 backend block in versions.tf is intentionally partial - no bucket or endpoint is
committed. Cloudflare R2 is S3-compatible; point a backend.hcl at the R2 bucket holding Zumik IaC
state.
terraform init -backend-config=backend.hclPlan and apply
terraform plan
terraform applyConventions
- Provider versions are pinned exactly in
versions.tf; commit the generated.terraform.lock.hcl. - Every secret variable is
sensitive = trueand has no default. - Real Cloudflare resource names only (
cloudflare_record,cloudflare_ruleset,cloudflare_zone_settings_override, thecloudflare_zonedata source). terraform.tfvars,rendered.env, and all state are gitignored.
The rendered provider keys feed the Bifrost gateway config, and the WAF ruleset is the edge layer of the rate-limiting story described there.
Portable Kubernetes
The EPP-owned BYOC alternative - llm-d / Gateway API Inference Extension + KServe + vLLM + AIBrix + LMCache, the infra/k8s-portable Helm chart, and how it differs from the Dynamo stack. Running it needs your own GPU cluster.
Multi-region & global infrastructure
Zumik's region model - the control-plane region registry, the GET /v2/regions map, data residency, and the Cloudflare load-balancer that geo-steers api.zumik.ai across regions with /healthz failover.