Zumik
v2 · Native state

BYOC clusters

Register, list, retrieve, heartbeat, and deregister bring-your-own-cloud clusters. The control plane records your data planes; it never holds GPUs.

BYOC is the escalation path activated only where replay proves value. This control-plane registry records the customer-cloud clusters a project operates (region, runtime stack, orchestrator, KV-cache layer, autoscaling envelope) so the broker can route to them and the console can dashboard them. The control plane never holds GPUs; the data plane itself is deployed from the Helm charts under infra/byoc/ and reports health back via heartbeats. Cluster ids are prefixed byc_. See BYOC execution and the BYOC stack.

All requests require a bearer API key. See authentication.

Register a cluster

POST /v2/byoc/clusters

namestringrequired

A human label for the cluster.

regionstringrequired

The region the cluster runs in, e.g. us.

runtimestringdefault: sglang+flashinfer

The runtime lane, e.g. sglang+flashinfer, llm-d+vllm, trtllm.

kv_cachestringdefault: lmcache+mooncake

The KV-cache management layer.

orchestratorstringdefault: dynamo

The orchestrator, e.g. dynamo or aibrix.

endpointstring

The customer-cloud data-plane endpoint the broker dispatches to.

autoscalingobject

The autoscaling envelope. min_replicas must be at most max_replicas, and max_replicas must be greater than zero. Defaults to { "min_replicas": 1, "max_replicas": 4, "target_ttft_ms": 500 }.

autoscaling
min_replicasintegerrequired
Minimum replicas.
max_replicasintegerrequired
Maximum replicas.
target_ttft_msintegerrequired
The TTFT SLA the autoscaler holds.
curl https://api.zumik.ai/v2/byoc/clusters \
  -H "Authorization: Bearer $ZUMIK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "us-east hot lane",
    "region": "us",
    "autoscaling": { "min_replicas": 1, "max_replicas": 8, "target_ttft_ms": 400 }
  }'
{
  "id": "byc_01jy7nuv23w5x6y7z8a9b0c1d4",
  "object": "byoc_cluster",
  "project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
  "name": "us-east hot lane",
  "region": "us",
  "status": "registering",
  "runtime": "sglang+flashinfer",
  "kv_cache": "lmcache+mooncake",
  "orchestrator": "dynamo",
  "endpoint": null,
  "autoscaling": { "min_replicas": 1, "max_replicas": 8, "target_ttft_ms": 400 },
  "created_at": "2026-06-15T16:48:18Z",
  "updated_at": "2026-06-15T16:48:18Z",
  "last_heartbeat_at": null
}
idstring

Opaque cluster id, prefixed byc_.

objectstring

Always byoc_cluster.

project_idstring

The owning project.

namestring
The cluster label.
regionstring
The cluster region.
statusstring

registering, active, draining, or down. A freshly registered cluster starts at registering.

runtimestring
The runtime lane.
kv_cachestring
The KV-cache layer.
orchestratorstring
The orchestrator.
endpointstring
The data-plane endpoint, or null.
autoscalingobject
The autoscaling envelope.
created_atstring
RFC 3339 timestamp.
updated_atstring
RFC 3339 timestamp.
last_heartbeat_atstring
When the cluster last reported, or null.

List clusters

GET /v2/byoc/clusters

curl https://api.zumik.ai/v2/byoc/clusters \
  -H "Authorization: Bearer $ZUMIK_API_KEY"

Returns { "object": "list", "data": [ ... ] } of cluster objects for the project.

Retrieve a cluster

GET /v2/byoc/clusters/{cluster_id}

cluster_idstringpathrequired

The byc_... id to fetch.

curl https://api.zumik.ai/v2/byoc/clusters/byc_01jy7nuv23w5x6y7z8a9b0c1d4 \
  -H "Authorization: Bearer $ZUMIK_API_KEY"

Returns the cluster object.

Heartbeat

POST /v2/byoc/clusters/{cluster_id}/heartbeat

The BYOC operator posts a heartbeat to advance the cluster status and prove liveness. The first active heartbeat moves a registering cluster to active.

cluster_idstringpathrequired

The byc_... id to update.

statusstringrequired

The reported data-plane status: active, draining, or down.

curl https://api.zumik.ai/v2/byoc/clusters/byc_01jy7nuv23w5x6y7z8a9b0c1d4/heartbeat \
  -H "Authorization: Bearer $ZUMIK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "status": "active" }'

Returns the cluster object with the updated status and last_heartbeat_at.

Deregister a cluster

DELETE /v2/byoc/clusters/{cluster_id}

cluster_idstringpathrequired

The byc_... id to deregister.

curl -X DELETE https://api.zumik.ai/v2/byoc/clusters/byc_01jy7nuv23w5x6y7z8a9b0c1d4 \
  -H "Authorization: Bearer $ZUMIK_API_KEY"
{
  "id": "byc_01jy7nuv23w5x6y7z8a9b0c1d4",
  "object": "byoc_cluster.deregistered",
  "deleted": true
}

HiCache activation plan

POST /v2/byoc/hicache-plan

Decide whether serving a cached prefix from a cache tier beats recomputing it on the GPU, per the §18.6 activation rule (expected_recompute_cost > lookup + transfer + decompression + queue_delay). Feed it replay-measured costs (in ms of TTFT) and it returns the cheapest viable tier, a per-tier breakdown, and a recommendation. Run this before turning on hicache.* in the BYOC stack. Pure planning - it holds no GPUs and stores nothing.

expected_recompute_msnumberrequired

Expected GPU cost to recompute the prefix, in ms - the left side of the inequality.

tiersarrayrequired

Candidate cache tiers and their measured fetch costs (1-16 entries).

tier
namestringrequired
Tier label, e.g. gpu_hbm, host_ram, local_nvme, remote.
lookup_msnumberrequired
Index/lookup cost.
transfer_msnumberrequired
Transfer cost to the GPU.
decompression_msnumberrequired
Decompression cost.
queue_delay_msnumberrequired
Queue-delay cost under load.
curl https://api.zumik.ai/v2/byoc/hicache-plan \
  -H "Authorization: Bearer $ZUMIK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "expected_recompute_ms": 40,
    "tiers": [
      { "name": "host_ram",   "lookup_ms": 1, "transfer_ms": 3,  "decompression_ms": 0.5, "queue_delay_ms": 1 },
      { "name": "local_nvme", "lookup_ms": 2, "transfer_ms": 12, "decompression_ms": 3,   "queue_delay_ms": 4 }
    ]
  }'
{
  "object": "hicache_plan",
  "expected_recompute_ms": 40,
  "decision": { "decision": "fetch", "tier": "host_ram", "saved_ms": 34.5 },
  "tiers": [
    { "name": "host_ram",   "total_ms": 5.5,  "beats_recompute": true, "saved_ms": 34.5 },
    { "name": "local_nvme", "total_ms": 21.0, "beats_recompute": true, "saved_ms": 19.0 }
  ],
  "recommendation": "activate_tier"
}
decisionobject

{ "decision": "fetch", "tier": "...", "saved_ms": N } for the cheapest viable tier, or { "decision": "recompute" } when no tier beats recompute.

tiersarray

Per-tier verdicts: total_ms, beats_recompute, and saved_ms (negative when the fetch loses).

recommendationstring

activate_tier (turn on hicache.enabled) or recompute_only (leave it off for this workload).

Errors

StatusCodeWhen
400invalid_request_errorEmpty name/region, an invalid autoscaling envelope, a heartbeat status outside active/draining/down, or a hicache-plan with no tiers / non-finite costs.
401invalid_api_keyMissing or invalid API key.
404invalid_request_errorThe cluster does not exist in this project.

See the full table on errors.

On this page