Zumik
v2 · Native state

Usage

Read per-generation usage events, a rolled-up summary, and an optional grouped breakdown. Every event records tokens, realized reuse, and the full QoS outcome.

Every generation records one usage event with the customer-facing metrics: tokens, realized reuse and its evidence level, the full QoS outcome, the resolved routing, and the amount charged. This endpoint reads them back, with a rolled-up summary and an optional grouped breakdown. See reuse metrics and QoS.

Realized reuse is not the same as reuse opportunity. A handle is not a cache hit; this surface reports what was actually captured, with an evidence level on every number.

All requests require a bearer API key. See authentication.

Read usage

GET /v2/usage

group_bystringquery

Add an aggregated breakdown along one dimension: provider, model, profile, region, or day.

curl "https://api.zumik.ai/v2/usage?group_by=provider" \
  -H "Authorization: Bearer $ZUMIK_API_KEY"
{
  "object": "usage",
  "project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
  "summary": {
    "request_count": 124,
    "input_tokens": 2261760,
    "output_tokens": 39680,
    "cached_tokens": 1854720,
    "realized_reused_tokens": 1854720,
    "realized_reuse_ratio": 0.82,
    "charged_micros": 1284000,
    "avg_latency_ms": 1620
  },
  "group_by": "provider",
  "breakdown": [
    {
      "key": "openai",
      "request_count": 90,
      "input_tokens": 1640000,
      "output_tokens": 28800,
      "realized_reused_tokens": 1345000,
      "charged_micros": 930000
    }
  ],
  "data": [
    {
      "id": "trc_01jy7nkl45o7p8q9r0s1t2u3vw",
      "object": "usage_event",
      "project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
      "created_at": "2026-06-15T16:25:55Z",
      "requested_model": "code.fast",
      "alias_release_id": "alr_01jy7nhi23m5n6o7p8q9r0s1tu",
      "resolved_provider": "openai",
      "resolved_model": "gpt-4o",
      "region": "us",
      "execution_profile": "managed_provider",
      "execution_mode": "live",
      "input_tokens": 18240,
      "output_tokens": 320,
      "total_tokens": 18560,
      "cached_tokens": 14980,
      "realized_reused_tokens": 14980,
      "reuse_evidence_level": "provider_reported",
      "cache_tier": "provider",
      "qos_class": "interactive",
      "qos_admission": "admitted",
      "qos_completion": "completed",
      "target_met": true,
      "deadline_met": true,
      "degraded": false,
      "fallback_used": false,
      "reason_code": null,
      "ttft_ms": 312,
      "latency_ms": 1840,
      "charged_micros": 10350
    }
  ]
}
objectstring

Always usage.

project_idstring

The owning project.

summaryobject

Rolled-up totals across all matching events.

summary
request_countinteger
Number of generations.
input_tokensinteger
Total input tokens.
output_tokensinteger
Total output tokens.
cached_tokensinteger
Total provider-reported cached tokens.
realized_reused_tokensinteger
Total realized reused tokens.
realized_reuse_rationumber
Realized reused tokens over input tokens.
charged_microsinteger
Total charged in micro-USD.
avg_latency_msinteger
Mean latency across events.
group_bystring

The dimension the breakdown is grouped by, when requested.

breakdownarray

Per-group rows sorted by request count, each with key, request_count, input_tokens, output_tokens, realized_reused_tokens, and charged_micros. Omitted when no group_by is set.

dataarray

The most recent usage events, newest first (capped per response). See the event fields below.

Usage event fields

idstring
The event id; equals the generation's trace id, prefixed trc_.
objectstring
Always usage_event.
created_atstring
RFC 3339 timestamp.
requested_modelstring
The model alias requested.
alias_release_idstring
The alr_... release that served the routing, or null.
resolved_providerstring
The provider the request resolved to.
resolved_modelstring
The concrete model.
regionstring
The execution region.
execution_profilestring
managed_provider, byok, or subscription.
execution_modestring
live or placeholder.
input_tokensinteger
Input tokens for the generation.
output_tokensinteger
Output tokens.
total_tokensinteger
Input plus output.
cached_tokensinteger
Provider-reported cached tokens.
realized_reused_tokensinteger
Tokens actually reused.
reuse_evidence_levelstring
How much the reuse number is trusted: provider_reported, runtime_confirmed, router_inferred, trace_estimated, or unknown.
cache_tierstring
Where reuse was served from: provider, gpu, host_ram, nvme, remote_kv, or unknown.
qos_classstring
The QoS class: interactive, standard, background, or batch.
qos_admissionstring
admitted, queued, rejected, or expired_before_start.
qos_completionstring
completed, failed, cancelled, or expired_during_execution.
target_metboolean
Whether the TTFT target was met, or null.
deadline_metboolean
Whether the deadline was met, or null.
degradedboolean
Whether the request was served degraded.
fallback_usedboolean
Whether a fallback profile served it.
reason_codestring
A stable reason code when a target was missed, otherwise null.
ttft_msinteger
Time to first token.
latency_msinteger
Total latency.
charged_microsinteger
Amount charged for this generation in micro-USD.

Errors

StatusCodeWhen
401invalid_api_keyMissing or invalid API key.

See the full table on errors.

On this page