BYOC clusters
Register, list, retrieve, heartbeat, and deregister bring-your-own-cloud clusters. The control plane records your data planes; it never holds GPUs.
BYOC is the escalation path activated only where replay proves value. This control-plane registry records the customer-cloud clusters a project operates (region, runtime stack, orchestrator, KV-cache layer, autoscaling envelope) so the broker can route to them and the console can dashboard them. The control plane never holds GPUs; the data plane itself is deployed from the Helm charts under infra/byoc/ and reports health back via heartbeats. Cluster ids are prefixed byc_. See BYOC execution and the BYOC stack.
All requests require a bearer API key. See authentication.
Register a cluster
POST /v2/byoc/clusters
namestringrequiredA human label for the cluster.
regionstringrequiredThe region the cluster runs in, e.g. us.
runtimestringdefault: sglang+flashinferThe runtime lane, e.g. sglang+flashinfer, llm-d+vllm, trtllm.
kv_cachestringdefault: lmcache+mooncakeThe KV-cache management layer.
orchestratorstringdefault: dynamoThe orchestrator, e.g. dynamo or aibrix.
endpointstringThe customer-cloud data-plane endpoint the broker dispatches to.
autoscalingobjectThe autoscaling envelope. min_replicas must be at most max_replicas, and max_replicas must be greater than zero. Defaults to { "min_replicas": 1, "max_replicas": 4, "target_ttft_ms": 500 }.
autoscaling
min_replicasintegerrequiredmax_replicasintegerrequiredtarget_ttft_msintegerrequiredcurl https://api.zumik.ai/v2/byoc/clusters \
-H "Authorization: Bearer $ZUMIK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "us-east hot lane",
"region": "us",
"autoscaling": { "min_replicas": 1, "max_replicas": 8, "target_ttft_ms": 400 }
}'{
"id": "byc_01jy7nuv23w5x6y7z8a9b0c1d4",
"object": "byoc_cluster",
"project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
"name": "us-east hot lane",
"region": "us",
"status": "registering",
"runtime": "sglang+flashinfer",
"kv_cache": "lmcache+mooncake",
"orchestrator": "dynamo",
"endpoint": null,
"autoscaling": { "min_replicas": 1, "max_replicas": 8, "target_ttft_ms": 400 },
"created_at": "2026-06-15T16:48:18Z",
"updated_at": "2026-06-15T16:48:18Z",
"last_heartbeat_at": null
}idstringOpaque cluster id, prefixed byc_.
objectstringAlways byoc_cluster.
project_idstringThe owning project.
namestringregionstringstatusstringregistering, active, draining, or down. A freshly registered cluster starts at registering.
runtimestringkv_cachestringorchestratorstringendpointstringnull.autoscalingobjectcreated_atstringupdated_atstringlast_heartbeat_atstringnull.List clusters
GET /v2/byoc/clusters
curl https://api.zumik.ai/v2/byoc/clusters \
-H "Authorization: Bearer $ZUMIK_API_KEY"Returns { "object": "list", "data": [ ... ] } of cluster objects for the project.
Retrieve a cluster
GET /v2/byoc/clusters/{cluster_id}
cluster_idstringpathrequiredThe byc_... id to fetch.
curl https://api.zumik.ai/v2/byoc/clusters/byc_01jy7nuv23w5x6y7z8a9b0c1d4 \
-H "Authorization: Bearer $ZUMIK_API_KEY"Returns the cluster object.
Heartbeat
POST /v2/byoc/clusters/{cluster_id}/heartbeat
The BYOC operator posts a heartbeat to advance the cluster status and prove liveness. The first active heartbeat moves a registering cluster to active.
cluster_idstringpathrequiredThe byc_... id to update.
statusstringrequiredThe reported data-plane status: active, draining, or down.
curl https://api.zumik.ai/v2/byoc/clusters/byc_01jy7nuv23w5x6y7z8a9b0c1d4/heartbeat \
-H "Authorization: Bearer $ZUMIK_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "status": "active" }'Returns the cluster object with the updated status and last_heartbeat_at.
Deregister a cluster
DELETE /v2/byoc/clusters/{cluster_id}
cluster_idstringpathrequiredThe byc_... id to deregister.
curl -X DELETE https://api.zumik.ai/v2/byoc/clusters/byc_01jy7nuv23w5x6y7z8a9b0c1d4 \
-H "Authorization: Bearer $ZUMIK_API_KEY"{
"id": "byc_01jy7nuv23w5x6y7z8a9b0c1d4",
"object": "byoc_cluster.deregistered",
"deleted": true
}HiCache activation plan
POST /v2/byoc/hicache-plan
Decide whether serving a cached prefix from a cache tier beats recomputing it on the GPU, per the §18.6 activation rule (expected_recompute_cost > lookup + transfer + decompression + queue_delay). Feed it replay-measured costs (in ms of TTFT) and it returns the cheapest viable tier, a per-tier breakdown, and a recommendation. Run this before turning on hicache.* in the BYOC stack. Pure planning - it holds no GPUs and stores nothing.
expected_recompute_msnumberrequiredExpected GPU cost to recompute the prefix, in ms - the left side of the inequality.
tiersarrayrequiredCandidate cache tiers and their measured fetch costs (1-16 entries).
tier
namestringrequiredgpu_hbm, host_ram, local_nvme, remote.lookup_msnumberrequiredtransfer_msnumberrequireddecompression_msnumberrequiredqueue_delay_msnumberrequiredcurl https://api.zumik.ai/v2/byoc/hicache-plan \
-H "Authorization: Bearer $ZUMIK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"expected_recompute_ms": 40,
"tiers": [
{ "name": "host_ram", "lookup_ms": 1, "transfer_ms": 3, "decompression_ms": 0.5, "queue_delay_ms": 1 },
{ "name": "local_nvme", "lookup_ms": 2, "transfer_ms": 12, "decompression_ms": 3, "queue_delay_ms": 4 }
]
}'{
"object": "hicache_plan",
"expected_recompute_ms": 40,
"decision": { "decision": "fetch", "tier": "host_ram", "saved_ms": 34.5 },
"tiers": [
{ "name": "host_ram", "total_ms": 5.5, "beats_recompute": true, "saved_ms": 34.5 },
{ "name": "local_nvme", "total_ms": 21.0, "beats_recompute": true, "saved_ms": 19.0 }
],
"recommendation": "activate_tier"
}decisionobject{ "decision": "fetch", "tier": "...", "saved_ms": N } for the cheapest viable tier, or { "decision": "recompute" } when no tier beats recompute.
tiersarrayPer-tier verdicts: total_ms, beats_recompute, and saved_ms (negative when the fetch loses).
recommendationstringactivate_tier (turn on hicache.enabled) or recompute_only (leave it off for this workload).
Errors
| Status | Code | When |
|---|---|---|
| 400 | invalid_request_error | Empty name/region, an invalid autoscaling envelope, a heartbeat status outside active/draining/down, or a hicache-plan with no tiers / non-finite costs. |
| 401 | invalid_api_key | Missing or invalid API key. |
| 404 | invalid_request_error | The cluster does not exist in this project. |
See the full table on errors.
Data rights
Export everything Zumik retains for a project, or request erasure. GDPR and CCPA data-subject rights, scoped strictly to the caller's project.
Support tickets
The enterprise support portal API - open a ticket, exchange a message thread with the Zumik team, and track status. First-response SLA is derived from the project's plan tier.