Analytics

Advanced, server-side analytics over the full retained usage window — a time series, SLA attainment, latency percentiles, cost and savings, and reuse/cache distributions, with a time range, filters, and a rich grouped breakdown.

/v2/usage is the simple, OpenAI-shaped feed: a flat summary plus the last page of raw events. /v2/analytics is the advanced surface. Over the full retained event window it rolls everything up server-side — so the numbers are accurate over real history, not just the last 100 events — with a time range, exact-match filters, a time series, SLA/reliability metrics, latency percentiles, and a rich grouped breakdown.

All money is in micro-USD; all rates are ratios in [0, 1]. Reuse is the realized capture (a handle is not a cache hit) carried with its evidence level. See reuse metrics and QoS.

All requests require a bearer API key. See authentication.

Read analytics

GET /v2/analytics

windowstringquery

Relative window ending now: 24h, 7d, 30d, 4w (units s/m/h/d/w; a bare number is seconds). Default 30d. Ignored when start is set.

startstringquery

Explicit RFC 3339 start (overrides window).

endstringquery

Explicit RFC 3339 end. Default now.

intervalstringquery

Time-series bucket: day (default) or hour. An hour request over a long window is automatically widened to day so the series stays bounded.

group_bystringquery

Add a rich breakdown along one dimension: provider, model, profile, region, key, qos_class, or cache_tier.

provider, model, profile, region, key, qos_classstringquery

Exact-match filters that narrow the dataset before aggregating. Combine freely (e.g. ?provider=anthropic&qos_class=interactive).

curl "https://api.zumik.ai/v2/analytics?window=7d&interval=day&group_by=provider" \
  -H "Authorization: Bearer $ZUMIK_API_KEY"

{
  "object": "analytics",
  "project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
  "range": { "start": "2026-06-15T00:00:00Z", "end": "2026-06-22T00:00:00Z", "interval": "day", "buckets": 8 },
  "filters": { "provider": null, "model": null, "profile": null, "region": null, "key": null, "qos_class": null },
  "summary": {
    "request_count": 1240,
    "input_tokens": 22617600,
    "output_tokens": 396800,
    "total_tokens": 23014400,
    "cached_tokens": 18547200,
    "realized_reused_tokens": 18547200,
    "realized_reuse_ratio": 0.82,
    "charged_micros": 12840000,
    "direct_cost_micros": 15010000,
    "savings_micros": 2170000,
    "savings_rate": 0.1446,
    "latency": { "avg_ms": 1620, "p50_ms": 1480, "p95_ms": 3120, "p99_ms": 4860 },
    "sla": {
      "target_met_rate": 0.964,
      "deadline_met_rate": 0.991,
      "degraded_rate": 0.012,
      "fallback_rate": 0.004,
      "completion": { "completed": 1232, "failed": 8 },
      "top_reason_codes": [{ "key": "cache_miss", "count": 41 }]
    },
    "cache_tiers": [{ "key": "provider", "count": 18547200 }],
    "evidence_levels": [{ "key": "provider_reported", "count": 1180 }, { "key": "router_inferred", "count": 60 }]
  },
  "series": [
    {
      "ts": "2026-06-15T00:00:00Z",
      "request_count": 160,
      "charged_micros": 1610000,
      "direct_cost_micros": 1880000,
      "savings_micros": 270000,
      "realized_reuse_ratio": 0.80,
      "p50_ms": 1500, "p95_ms": 3200, "p99_ms": 4900,
      "target_met_rate": 0.95,
      "fallback_rate": 0.0
    }
  ],
  "group_by": "provider",
  "breakdown": [
    {
      "key": "openai",
      "request_count": 900,
      "input_tokens": 16400000,
      "output_tokens": 288000,
      "realized_reused_tokens": 13450000,
      "realized_reuse_ratio": 0.82,
      "charged_micros": 9300000,
      "direct_cost_micros": 10800000,
      "savings_micros": 1500000,
      "savings_rate": 0.1389,
      "avg_latency_ms": 1580,
      "p95_ms": 3010,
      "target_met_rate": 0.97,
      "fallback_rate": 0.003
    }
  ]
}

objectstring

Always analytics.

rangeobject

The resolved window and bucketing: start, end, interval (day/hour), and buckets (series length).

filtersobject

Echo of the applied exact-match filters, so a caller knows exactly what the numbers cover.

summaryobject

Roll-up across the whole range.

summary

request_countinteger

Generations in range.

charged_microsinteger

Total charged (micro-USD).

direct_cost_microsinteger

What the same tokens would cost at provider list price — the going-direct baseline.

savings_microsinteger

direct_cost_micros − charged_micros, floored at 0.

savings_ratenumber

savings / direct_cost — the effective discount versus going direct.

realized_reuse_rationumber

Realized reused tokens over input tokens.

latencyobject

avg_ms plus p50_ms, p95_ms, p99_ms (nearest-rank percentiles).

slaobject

Reliability from the QoS outcome: target_met_rate, deadline_met_rate, degraded_rate, fallback_rate, a completion count map, and the top_reason_codes.

cache_tiersarray

Realized-reuse tokens by cache tier (provider, gpu, host_ram, nvme, remote_kv), count desc.

evidence_levelsarray

Events by reuse-evidence level — how trustworthy each reuse claim is.

seriesarray

Continuous time buckets across the range, oldest first (empty buckets are present, not dropped, so charts don't lie about gaps). Each bucket carries ts, request_count, charged_micros, direct_cost_micros, savings_micros, realized_reuse_ratio, p50_ms/p95_ms/p99_ms, target_met_rate, and fallback_rate.

group_bystring

The breakdown dimension, when requested.

breakdownarray

Per-group rows sorted by spend desc, each carrying the full metric set (requests, tokens, spend, savings + rate, reuse ratio, average + p95 latency, target-met and fallback rates). Omitted when no group_by is set.

Errors

A malformed window/start/end, a start not before end, or a range over 366 days returns 400 invalid_request_error. An unknown bearer key returns 401.

Analytics

Read analytics

Errors

On this page