OpenAI
OpenAI through Zumik - automatic prompt caching at a 50% read discount, the Batch API at 50% off, flex/default/scale service tiers, response retrieval and background mode, and when the broker routes here.
OpenAI is the stable first choice for broad model coverage and native Responses API compatibility. It caches automatically with no client instrumentation, runs a Batch API for non-interactive work, and exposes the widest set of stateful features (response retrieval, conversation state, background mode) of the five first-class providers.
It is available on both the managed-provider and
BYOK profiles. Requests that resolve here report
Agent-Resolved-Provider: openai.
Caching economics
OpenAI prompt caching activates automatically for eligible requests; there is nothing to mark. Stable content (system prompt, tool definitions, response schema) must appear before dynamic content, because a single volatile token near the top resets the prefix.
| Fact | Value |
|---|---|
| Cache type | Automatic |
| Minimum cacheable prefix | 1,024 tokens |
| Cache-read discount | 50% on cached input tokens |
| Default TTL | 600 seconds |
| Extended TTL | up to 86,400 seconds (24h) on supported models |
| Cached-token reporting | Yes (cached_tokens in usage) |
| Manual cache clear | Not supported |
Because manual clearing is not available, a managed-provider purge for OpenAI-cached state is
expiry-bound: the guarantee is best_effort_expiry, not physical purge. For state you need to
invalidate on demand, that is a reason to consider BYOC. See
retention and purge.
Batch and service tiers
The Batch API delivers a 50% cost reduction at up to 24h turnaround. Route non-interactive work
here: background evaluations, diagnostic trace reprocessing, bulk embeddings, pre-computable tool
calls, replay experiments. Never route interactive or QoS interactive requests to Batch.
Service tiers map to the QoS class in the agent hints contract:
flexfor cost-priority, non-urgent tasks.defaultfor standard interactive traffic.scalefor guaranteed-throughput commitments.
Stateful features
OpenAI is the only first-class provider here that reports response_retrieval_supported,
conversation_state_supported, and background_mode_supported. That makes it the natural home for
long-running background generations and server-side conversation state when you want the provider to
hold it rather than Zumik's sessions.
When the broker routes here
Broad coverage default
The stable first choice when no other provider has a decisive cost or capability edge for the request.
Responses API compatibility
Workloads built around the native Responses API and its stateful features.
Background and batch work
Non-interactive tasks that take the 50% Batch discount, or background-mode generations.
Forgiving caching
Long stable prefixes that cache automatically - just keep volatile content off the front.
At a glance
| Capability | Value |
|---|---|
| Context window | 128,000 tokens |
| Multimodal input | Yes |
| Live search | No |
| Dedicated deployment | No |
| Service tiers | flex, default, scale |
| Data retention | standard |
| Regions | us, eu |
Manifest revision cap_2026_06_09. The capability manifest is what
the broker routes on and pins per decision. Confirm a hit through cached_tokens in the /v1 usage
object or the full reuse waterfall on /v2/usage.
OpenRouter emergency fallback
The last-resort continuity layer - used only after a verified primary failure for a required model path, gated behind explicit policy, and audited on every use. Never routine arbitration.
Anthropic
Anthropic through Zumik - the deepest cache-read discount of any managed provider at 90%, explicit cache_control breakpoints, the Message Batches API at 50% off, zero-retention availability, and when the broker routes here.