OpenAI

OpenAI through Zumik - automatic prompt caching at a 50% read discount, the Batch API at 50% off, flex/default/scale service tiers, response retrieval and background mode, and when the broker routes here.

OpenAI is the stable first choice for broad model coverage and native Responses API compatibility. It caches automatically with no client instrumentation, runs a Batch API for non-interactive work, and exposes the widest set of stateful features (response retrieval, conversation state, background mode) of the five first-class providers.

It is available on both the managed-provider and BYOK profiles. Requests that resolve here report Agent-Resolved-Provider: openai.

Caching economics

OpenAI prompt caching activates automatically for eligible requests; there is nothing to mark. Stable content (system prompt, tool definitions, response schema) must appear before dynamic content, because a single volatile token near the top resets the prefix.

Fact	Value
Cache type	Automatic
Minimum cacheable prefix	1,024 tokens
Cache-read discount	50% on cached input tokens
Default TTL	600 seconds
Extended TTL	up to 86,400 seconds (24h) on supported models
Cached-token reporting	Yes (`cached_tokens` in usage)
Manual cache clear	Not supported

Because manual clearing is not available, a managed-provider purge for OpenAI-cached state is expiry-bound: the guarantee is best_effort_expiry, not physical purge. For state you need to invalidate on demand, that is a reason to consider BYOC. See retention and purge.

Batch and service tiers

The Batch API delivers a 50% cost reduction at up to 24h turnaround. Route non-interactive work here: background evaluations, diagnostic trace reprocessing, bulk embeddings, pre-computable tool calls, replay experiments. Never route interactive or QoS interactive requests to Batch.

Service tiers map to the QoS class in the agent hints contract:

flex for cost-priority, non-urgent tasks.
default for standard interactive traffic.
scale for guaranteed-throughput commitments.

Stateful features

OpenAI is the only first-class provider here that reports response_retrieval_supported, conversation_state_supported, and background_mode_supported. That makes it the natural home for long-running background generations and server-side conversation state when you want the provider to hold it rather than Zumik's sessions.

When the broker routes here

Broad coverage default

The stable first choice when no other provider has a decisive cost or capability edge for the request.

Responses API compatibility

Workloads built around the native Responses API and its stateful features.

Background and batch work

Non-interactive tasks that take the 50% Batch discount, or background-mode generations.

Forgiving caching

Long stable prefixes that cache automatically - just keep volatile content off the front.

At a glance

Capability	Value
Context window	128,000 tokens
Multimodal input	Yes
Live search	No
Dedicated deployment	No
Service tiers	flex, default, scale
Data retention	standard
Regions	us, eu

Manifest revision cap_2026_06_09. The capability manifest is what the broker routes on and pins per decision. Confirm a hit through cached_tokens in the /v1 usage object or the full reuse waterfall on /v2/usage.