Usage
Read per-generation usage events, a rolled-up summary, and an optional grouped breakdown. Every event records tokens, realized reuse, and the full QoS outcome.
Every generation records one usage event with the customer-facing metrics: tokens, realized reuse and its evidence level, the full QoS outcome, the resolved routing, and the amount charged. This endpoint reads them back, with a rolled-up summary and an optional grouped breakdown. See reuse metrics and QoS.
Realized reuse is not the same as reuse opportunity. A handle is not a cache hit; this surface reports what was actually captured, with an evidence level on every number.
All requests require a bearer API key. See authentication.
Read usage
GET /v2/usage
group_bystringqueryAdd an aggregated breakdown along one dimension: provider, model, profile, region, or day.
curl "https://api.zumik.ai/v2/usage?group_by=provider" \
-H "Authorization: Bearer $ZUMIK_API_KEY"{
"object": "usage",
"project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
"summary": {
"request_count": 124,
"input_tokens": 2261760,
"output_tokens": 39680,
"cached_tokens": 1854720,
"realized_reused_tokens": 1854720,
"realized_reuse_ratio": 0.82,
"charged_micros": 1284000,
"avg_latency_ms": 1620
},
"group_by": "provider",
"breakdown": [
{
"key": "openai",
"request_count": 90,
"input_tokens": 1640000,
"output_tokens": 28800,
"realized_reused_tokens": 1345000,
"charged_micros": 930000
}
],
"data": [
{
"id": "trc_01jy7nkl45o7p8q9r0s1t2u3vw",
"object": "usage_event",
"project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
"created_at": "2026-06-15T16:25:55Z",
"requested_model": "code.fast",
"alias_release_id": "alr_01jy7nhi23m5n6o7p8q9r0s1tu",
"resolved_provider": "openai",
"resolved_model": "gpt-4o",
"region": "us",
"execution_profile": "managed_provider",
"execution_mode": "live",
"input_tokens": 18240,
"output_tokens": 320,
"total_tokens": 18560,
"cached_tokens": 14980,
"realized_reused_tokens": 14980,
"reuse_evidence_level": "provider_reported",
"cache_tier": "provider",
"qos_class": "interactive",
"qos_admission": "admitted",
"qos_completion": "completed",
"target_met": true,
"deadline_met": true,
"degraded": false,
"fallback_used": false,
"reason_code": null,
"ttft_ms": 312,
"latency_ms": 1840,
"charged_micros": 10350
}
]
}objectstringAlways usage.
project_idstringThe owning project.
summaryobjectRolled-up totals across all matching events.
summary
request_countintegerinput_tokensintegeroutput_tokensintegercached_tokensintegerrealized_reused_tokensintegerrealized_reuse_rationumbercharged_microsintegeravg_latency_msintegergroup_bystringThe dimension the breakdown is grouped by, when requested.
breakdownarrayPer-group rows sorted by request count, each with key, request_count, input_tokens, output_tokens, realized_reused_tokens, and charged_micros. Omitted when no group_by is set.
dataarrayThe most recent usage events, newest first (capped per response). See the event fields below.
Usage event fields
idstringtrc_.objectstringusage_event.created_atstringrequested_modelstringalias_release_idstringalr_... release that served the routing, or null.resolved_providerstringresolved_modelstringregionstringexecution_profilestringmanaged_provider, byok, or subscription.execution_modestringlive or placeholder.input_tokensintegeroutput_tokensintegertotal_tokensintegercached_tokensintegerrealized_reused_tokensintegerreuse_evidence_levelstringprovider_reported, runtime_confirmed, router_inferred, trace_estimated, or unknown.cache_tierstringprovider, gpu, host_ram, nvme, remote_kv, or unknown.qos_classstringinteractive, standard, background, or batch.qos_admissionstringadmitted, queued, rejected, or expired_before_start.qos_completionstringcompleted, failed, cancelled, or expired_during_execution.target_metbooleannull.deadline_metbooleannull.degradedbooleanfallback_usedbooleanreason_codestringnull.ttft_msintegerlatency_msintegercharged_microsintegerErrors
| Status | Code | When |
|---|---|---|
| 401 | invalid_api_key | Missing or invalid API key. |
See the full table on errors.
Agent hints
Store a reusable Agent Hints object and reference it by id, or send hints inline on any request. Hints express intent without provider-specific knobs.
Analytics
Advanced, server-side analytics over the full retained usage window — a time series, SLA attainment, latency percentiles, cost and savings, and reuse/cache distributions, with a time range, filters, and a rich grouped breakdown.