Replay

The trace envelope and its privacy modes, the five replay classes, the metrics a replay compares, and the signed report with provenance and an evidence digest - the system that justifies (or rejects) a BYOC migration.

Replay re-runs recorded traffic against a candidate configuration and produces a signed comparison. It is how Zumik turns "self-hosting might be cheaper" into evidence: a baseline, a candidate, a metric delta, and a digest a stakeholder can verify. Replay also drives provider comparison, prompt-ordering experiments, cache-retention experiments, router tuning, QoS validation, regression testing, and purge verification.

The trace envelope

Replay runs over a formal trace schema. Each trace records the request shape and what was observed, keyed to a privacy_mode that decides how much is stored.

{
  "trace_schema_version": "2026-06-01",
  "trace_id": "trc_…",
  "project_id": "prj_…",
  "privacy_mode": "tokenized",
  "request": {
    "api_surface": "v1_responses",
    "snapshot_id": "snp_…",
    "ordered_block_manifest": ["blk_1", "blk_2", "blk_3"],
    "prompt_compiler_revision": "pc_11",
    "alias_release_id": "alr_…"
  },
  "schedule": { "arrival_offset_ms": 1850, "session_id": "ses_…", "branch_id": "br_…", "concurrency_group": "cg_4" },
  "observed": {
    "resolved_target": "openai/gpt-4o@2025-01-01",
    "ttft_ms": 410, "latency_ms": 2880,
    "input_tokens": 18240, "candidate_reuse_tokens": 17100,
    "realized_reused_tokens": 14980, "output_tokens": 412, "attempt_count": 1
  }
}

Privacy modes

Replay never requires raw prompt text. Choose the least-revealing mode that still answers your question:

Mode	Stored	Use for
`metadata`	Lengths, timing, fingerprints, lineage, usage.	Low-risk diagnostics.
`tokenized`	Token ids plus redacted metadata.	Faithful performance replay without plaintext.
`encrypted_full_fidelity`	Encrypted source payloads under your policy.	Output-quality evaluation.
`synthetic`	A generated workload with matching structure.	Public benchmarks and stress testing.

Raw prompt text is never retained by default - see Data privacy and retention.

Replay classes

replay_class sets how much work the run does:

Class	What it does
`routing_simulation`	Compares routing policy without running inference. Cheapest; runs inline and completes immediately.
`synthetic_performance`	Tests throughput and TTFT with structurally similar prompts.
`tokenized_performance`	Tests exact token-shape reuse without plaintext.
`full_fidelity_evaluation`	Compares latency, outputs, tool calls, and quality.
`purge_verification`	Confirms invalidated artifacts cannot be reused (see Purge semantics).

Run a replay

POST /v2/replay-runs. routing_simulation is the default and runs without inference, so it returns status: completed right away with routing-agreement metrics:

curl https://api.zumik.ai/v2/replay-runs \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "baseline": "openai/gpt-4o@2025-01-01",
    "candidate": "byoc_us_east",
    "replay_class": "routing_simulation",
    "repetitions": 100
  }'

Response

{
  "id": "rpl_01jy…",
  "object": "replay_run",
  "status": "completed",
  "baseline": "openai/gpt-4o@2025-01-01",
  "candidate": "byoc_us_east",
  "replay_class": "routing_simulation",
  "repetitions": 100,
  "metrics": { "routing_agreement": 0.94, "divergence": 0.06, "samples": 100 }
}

Performance and evaluation classes run over a captured traffic manifest. Supply it inline as a traces array, or set traffic_manifest_ref to "usage:N" to build one from your last N usage events (token shapes and timing only — never prompt text):

curl https://api.zumik.ai/v2/replay-runs \
  -H "Authorization: Bearer zk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "baseline": "managed",
    "candidate": "byoc_us_east",
    "replay_class": "tokenized_performance",
    "traffic_manifest_ref": "usage:500"
  }'

These classes are scheduled onto the background runner and move queued → running → completed; poll GET /v2/replay-runs/{id} or list them. A scheduled_for (RFC 3339) holds a run until its time; a still-queued run can be canceled with POST /v2/replay-runs/{id}/cancel. full_fidelity_evaluation re-executes recorded messages for real and is budget-gated; the other classes are deterministic projections over the manifest.

What replay compares

A performance or evaluation run compares TTFT, end-to-end latency, output-token rate, reuse capture rate, tool-call validity, structured-output validity, error rate, deadline misses, GPU-seconds where measurable, provider cost, and a task/semantic-quality score. Outputs are not required to be byte-identical unless the backend explicitly supports deterministic reproduction.

The signed report

GET /v2/replay-runs/{id}/report returns a report built for a decision you can defend:

{
  "object": "replay_report",
  "replay_run_id": "rpl_01jy…",
  "generated_at": "2026-06-15T12:05:00Z",
  "baseline": "openai/gpt-4o@2025-01-01",
  "candidate": "byoc_us_east",
  "replay_class": "routing_simulation",
  "provenance": {
    "trace_schema_version": "2026-06-01",
    "replay_runner_version": "rr_1.0.0",
    "cache_mode": "provider_default",
    "concurrency": 1,
    "repetitions": 100
  },
  "metrics": { "routing_agreement": 0.94, "divergence": 0.06, "samples": 100 },
  "assumptions": [
    "Outputs are not required to be byte-identical unless the backend supports deterministic reproduction.",
    "Cost and latency deltas are computed against the recorded traffic manifest, not live re-billing."
  ],
  "known_limitations": ["routing_simulation does not execute inference; it compares routing policy only."],
  "evidence_digest": "sig_9f86d081…"
}

The provenance block records the trace schema, runner version, cache mode, concurrency, and repetitions; assumptions and known_limitations state what the run does and does not prove. The evidence_digest is computed once when the run completes and pinned to it, so it never drifts if an alias is edited later. When Zumik is configured with a signing key the digest is a keyed HMAC-SHA256 (sig_<64 hex>) only Zumik can produce — so an altered report won't match, and Zumik can verify integrity on request. Without a signing key it is an honest unkeyed checksum (sha256_<64 hex>), labeled distinctly so it is never mistaken for a signature.

When replay justifies a profile change

A self-hosted profile is activated only when replay proves it beats the managed path after full cost.