Zumik
v2 · Native state

Replay runs

Pin a baseline and a candidate execution profile, replay a captured traffic manifest, and render a signed report with full provenance and an evidence digest.

Replay is a product, not a script. A run pins a baseline and a candidate execution profile and compares them over a captured traffic manifest. The report carries the provenance §20.5 requires and a verifiable digest (§20.7). Runs are prefixed rpl_. See the replay guide.

Lifecycle

A run moves through queued → running → completed | failed, or canceled if you cancel it before it starts.

  • routing_simulation runs inline at create time — it only compares routing policy (no inference), fully determined by the alias resolver over a fixed seed sweep, so it returns completed immediately.
  • The other four classes are scheduled onto the background runner, which drains due runs and records the outcome. A run with a future scheduled_for stays queued until its time comes.
  • If api-core restarts mid-run, the interrupted run is moved to failed (with a reason) rather than wedging the queue — resubmit to retry.

Traffic manifest

Every class except routing_simulation replays an ordered list of §20.2 trace envelopes. Supply it one of two ways:

  • Inline — pass a traces array. Use this for full_fidelity_evaluation, where each trace carries the messages to re-execute (an encrypted_full_fidelity capture).
  • From your usage — set traffic_manifest_ref to "usage:N" to build a manifest from your last N recorded usage events. These carry token shapes and timing but never prompt text, so a token-shape replay needs no new capture pipeline and leaks no PII.

A single run's manifest is capped at 5,000 traces.

All requests require a bearer API key. See authentication.

Create a replay run

POST /v2/replay-runs

baselinestringrequired

The baseline target or profile, e.g. openai/gpt-4o@2025-01-01 or managed. A name matching a live alias resolves through its weighted release; provider/model[@revision] is a fixed target.

candidatestringrequired

The candidate to compare against, e.g. byoc_us_east or anthropic/claude-3-7-sonnet.

replay_classstringdefault: routing_simulation

One of routing_simulation, synthetic_performance, tokenized_performance, full_fidelity_evaluation, purge_verification. See classes.

traffic_manifest_refstring

A manifest reference. "usage:N" builds a manifest from your last N usage events.

tracesarray

Inline manifest. Each entry is a trace envelope: input_tokens, output_tokens, optional candidate_reuse_tokens, realized_reused_tokens, ttft_ms, latency_ms, namespace_generation, and — for full_fidelity_evaluationmessages ([{ role, content }]).

repetitionsintegerdefault: 1

Replay repetitions, clamped to 1–1000.

scheduled_forstring

RFC 3339 time to hold the run until. Absent means run as soon as possible.

concurrencyintegerdefault: 1

The concurrency group the runner models; recorded in provenance.

curl https://api.zumik.ai/v2/replay-runs \
  -H "Authorization: Bearer $ZUMIK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "baseline": "managed",
    "candidate": "byoc_us_east",
    "replay_class": "tokenized_performance",
    "traffic_manifest_ref": "usage:500"
  }'
{
  "id": "rpl_01jy7nfg01j2k3l4m5n6o7p8qr",
  "object": "replay_run",
  "project_id": "prj_01jy7n0a4c8m2t6v9q3wrxk7bd",
  "created_at": "2026-06-15T16:18:09Z",
  "status": "queued",
  "baseline": "managed",
  "candidate": "byoc_us_east",
  "replay_class": "tokenized_performance",
  "traffic_manifest_ref": "usage:500",
  "repetitions": 1,
  "concurrency": 1,
  "scheduled_for": null,
  "started_at": null,
  "completed_at": null,
  "failure_reason": null,
  "attempt": 0,
  "manifest_size": 500,
  "metrics": null
}
statusstring

queued, running, completed, failed, or canceled.

manifest_sizeinteger

How many trace envelopes the run will replay.

started_at / completed_atstring

RFC 3339 timestamps set as the run is claimed and finalized.

failure_reasonstring

Present only on a failed run.

attemptinteger

Increments each time the runner claims the run.

metricsobject

Computed results once completed; null while queued/running. Shape depends on the class.

Replay classes

ClassWhat it doesManifest
routing_simulationCompares baseline vs candidate routing over a seed sweep — no inference. Runs inline.none
tokenized_performanceReplays exact recorded token shapes through the reuse-adjusted cost model; reports per-request cost + reuse-capture deltas with confidence intervals, plus routing divergence.token shapes
synthetic_performanceGenerates a structurally similar workload from the manifest's distribution and projects cost and TTFT (TTFT grounded in each trace's observed TTFT, reduced by extra reuse).token shapes
full_fidelity_evaluationRe-executes recorded turns through the broker for both targets and measures real latency, output-token rate, and output divergence. Budget-gated real spend.messages
purge_verificationConfirms artifacts from a purged namespace generation can no longer be reused.namespace_generation

List replay runs

GET /v2/replay-runs

Returns { "object": "list", "data": [...] }, newest first. Each entry is a run object (without the manifest body).

Cancel a replay run

POST /v2/replay-runs/{replay_run_id}/cancel

Cancels a still-queued run. A run that is already running or finished returns 400.

Retrieve a replay run

GET /v2/replay-runs/{replay_run_id}

Returns the run object. Poll it (or list) to watch a scheduled run progress to completed.

Render a signed report

GET /v2/replay-runs/{replay_run_id}/report

The self-describing report (§20.5/§20.7): the full provenance block, a traffic-manifest summary, the computed metrics (or a pending note before completion), stated assumptions and known limitations, the recommended profile, and a verifiable evidence_digest.

{
  "object": "replay_report",
  "replay_run_id": "rpl_01jy7nfg01j2k3l4m5n6o7p8qr",
  "generated_at": "2026-06-15T16:19:42Z",
  "status": "completed",
  "baseline": "managed",
  "candidate": "byoc_us_east",
  "replay_class": "tokenized_performance",
  "provenance": {
    "trace_schema_version": "2026-06-01",
    "replay_runner_version": "rr_1.0.0",
    "runtime_engine_version": "api-core/0.1.0",
    "model_alias_release": { "baseline": null, "candidate": null },
    "resolved_model_revision": { "baseline": "unpinned", "candidate": "unpinned" },
    "prompt_compiler_revision": "pc_11",
    "tokenizer_revision": "tok_7",
    "cache_mode": "provider_default",
    "warmup_period_s": 0,
    "cold_start_period_s": 0,
    "request_arrival_schedule": "as_fast_as_possible",
    "concurrency": 1,
    "retry_policy": "none",
    "provider_rate_limits": "provider_default",
    "repetitions": 1,
    "confidence_intervals": "95% normal-approximation on per-request samples; p50/p95/p99 reported",
    "quality_evaluator_version": "qe_none",
    "failures_and_dropped": { "failures": 0, "dropped": 0 }
  },
  "traffic_manifest": {
    "ref": "usage:500",
    "traces": 500,
    "total_input_tokens": 8400000,
    "total_output_tokens": 210000,
    "total_realized_reuse_tokens": 3100000,
    "traces_with_full_fidelity_payload": 0
  },
  "metrics": {
    "metric_deltas": {
      "provider_cost_micros": { "baseline": 22100000, "candidate": 22100000, "delta": 0, "pct": 0 },
      "reuse_capture_pct": { "baseline": 36.9, "candidate": 36.9 }
    },
    "confidence_intervals": { "per_request_cost_savings_micros": { "n": 500, "mean": 0, "p50": 0, "p95": 0, "p99": 0, "ci95_low": 0, "ci95_high": 0 } },
    "recommended_profile": "baseline",
    "failures": 0,
    "dropped": 0
  },
  "assumptions": ["..."],
  "quality_guardrails": "token-shape replay does not execute inference; output quality is unchanged from the recorded run.",
  "known_limitations": ["..."],
  "recommended_profile": "baseline",
  "evidence_digest": "sig_b71e...4d"
}
provenanceobject

Every §20.5 field: schema/runner/engine versions, alias releases, resolved model revisions, prompt-compiler and tokenizer revisions, cache mode, warmup and cold-start periods, arrival schedule, concurrency, retry policy, provider rate limits, repetitions, confidence-interval policy, quality-evaluator version, and the failure/dropped counts.

traffic_manifestobject

A summary of the replayed traffic: trace count, total input/output/reuse tokens, and how many traces carried a full-fidelity payload.

metricsobject

Class-specific comparison: metric deltas, confidence intervals, and failures / dropped.

quality_guardrailsstring

What the class does and does not guarantee about output quality.

recommended_profilestring

The profile the report recommends (baseline, candidate, either, or n/a).

evidence_digeststring

Pinned at completion over provenance, the manifest summary, and metrics. With a signing key it is a keyed HMAC-SHA256 (sig_…) only Zumik can produce — tamper-evident; without one it is an unkeyed checksum (sha256_…).

Errors

StatusCodeWhen
400invalid_request_errorbaseline/candidate empty, a class that needs a manifest got none, an unsupported traffic_manifest_ref, a bad scheduled_for, or canceling a non-queued run.
401invalid_api_keyMissing or invalid API key.
404invalid_request_errorThe run does not exist in this project.

See the full table on errors.

On this page