Analytics
Advanced, server-side analytics over the full retained usage window — a time series, SLA attainment, latency percentiles, cost and savings, and reuse/cache distributions, with a time range, filters, and a rich grouped breakdown.
/v2/usage is the simple, OpenAI-shaped feed: a flat summary plus the last page of raw events. /v2/analytics is the advanced surface. Over the full retained event window it rolls everything up server-side — so the numbers are accurate over real history, not just the last 100 events — with a time range, exact-match filters, a time series, SLA/reliability metrics, latency percentiles, and a rich grouped breakdown.
All money is in micro-USD; all rates are ratios in [0, 1]. Reuse is the realized capture (a handle is not a cache hit) carried with its evidence level. See reuse metrics and QoS.
All requests require a bearer API key. See authentication.
Read analytics
GET /v2/analytics
windowstringqueryRelative window ending now: 24h, 7d, 30d, 4w (units s/m/h/d/w; a bare number is seconds). Default 30d. Ignored when start is set.
startstringqueryExplicit RFC 3339 start (overrides window).
endstringqueryExplicit RFC 3339 end. Default now.
intervalstringqueryTime-series bucket: day (default) or hour. An hour request over a long window is automatically widened to day so the series stays bounded.
group_bystringqueryAdd a rich breakdown along one dimension: provider, model, profile, region, key, qos_class, or cache_tier.
provider, model, profile, region, key, qos_classstringqueryExact-match filters that narrow the dataset before aggregating. Combine freely (e.g. ?provider=anthropic&qos_class=interactive).
curl "https://api.zumik.ai/v2/analytics?window=7d&interval=day&group_by=provider" \
-H "Authorization: Bearer $ZUMIK_API_KEY"{
"object": "analytics",
"project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
"range": { "start": "2026-06-15T00:00:00Z", "end": "2026-06-22T00:00:00Z", "interval": "day", "buckets": 8 },
"filters": { "provider": null, "model": null, "profile": null, "region": null, "key": null, "qos_class": null },
"summary": {
"request_count": 1240,
"input_tokens": 22617600,
"output_tokens": 396800,
"total_tokens": 23014400,
"cached_tokens": 18547200,
"realized_reused_tokens": 18547200,
"realized_reuse_ratio": 0.82,
"charged_micros": 12840000,
"direct_cost_micros": 15010000,
"savings_micros": 2170000,
"savings_rate": 0.1446,
"latency": { "avg_ms": 1620, "p50_ms": 1480, "p95_ms": 3120, "p99_ms": 4860 },
"sla": {
"target_met_rate": 0.964,
"deadline_met_rate": 0.991,
"degraded_rate": 0.012,
"fallback_rate": 0.004,
"completion": { "completed": 1232, "failed": 8 },
"top_reason_codes": [{ "key": "cache_miss", "count": 41 }]
},
"cache_tiers": [{ "key": "provider", "count": 18547200 }],
"evidence_levels": [{ "key": "provider_reported", "count": 1180 }, { "key": "router_inferred", "count": 60 }]
},
"series": [
{
"ts": "2026-06-15T00:00:00Z",
"request_count": 160,
"charged_micros": 1610000,
"direct_cost_micros": 1880000,
"savings_micros": 270000,
"realized_reuse_ratio": 0.80,
"p50_ms": 1500, "p95_ms": 3200, "p99_ms": 4900,
"target_met_rate": 0.95,
"fallback_rate": 0.0
}
],
"group_by": "provider",
"breakdown": [
{
"key": "openai",
"request_count": 900,
"input_tokens": 16400000,
"output_tokens": 288000,
"realized_reused_tokens": 13450000,
"realized_reuse_ratio": 0.82,
"charged_micros": 9300000,
"direct_cost_micros": 10800000,
"savings_micros": 1500000,
"savings_rate": 0.1389,
"avg_latency_ms": 1580,
"p95_ms": 3010,
"target_met_rate": 0.97,
"fallback_rate": 0.003
}
]
}objectstringanalytics.rangeobjectThe resolved window and bucketing: start, end, interval (day/hour), and buckets (series length).
filtersobjectEcho of the applied exact-match filters, so a caller knows exactly what the numbers cover.
summaryobjectRoll-up across the whole range.
summary
request_countintegercharged_microsintegerdirect_cost_microsintegersavings_microsintegerdirect_cost_micros − charged_micros, floored at 0.savings_ratenumbersavings / direct_cost — the effective discount versus going direct.realized_reuse_rationumberlatencyobjectavg_ms plus p50_ms, p95_ms, p99_ms (nearest-rank percentiles).slaobjecttarget_met_rate, deadline_met_rate, degraded_rate, fallback_rate, a completion count map, and the top_reason_codes.cache_tiersarrayprovider, gpu, host_ram, nvme, remote_kv), count desc.evidence_levelsarrayseriesarrayContinuous time buckets across the range, oldest first (empty buckets are present, not dropped, so charts don't lie about gaps). Each bucket carries ts, request_count, charged_micros, direct_cost_micros, savings_micros, realized_reuse_ratio, p50_ms/p95_ms/p99_ms, target_met_rate, and fallback_rate.
group_bystringbreakdownarrayPer-group rows sorted by spend desc, each carrying the full metric set (requests, tokens, spend, savings + rate, reuse ratio, average + p95 latency, target-met and fallback rates). Omitted when no group_by is set.
Errors
A malformed window/start/end, a start not before end, or a range over 366 days returns 400 invalid_request_error. An unknown bearer key returns 401.
Usage
Read per-generation usage events, a rolled-up summary, and an optional grouped breakdown. Every event records tokens, realized reuse, and the full QoS outcome.
API keys
Create, list, and revoke API keys, and set a per-key spending limit. Keys are shown in full exactly once and stored only as a hash.