Anthropic
Anthropic through Zumik - the deepest cache-read discount of any managed provider at 90%, explicit cache_control breakpoints, the Message Batches API at 50% off, zero-retention availability, and when the broker routes here.
Anthropic carries the single highest-leverage economics of the five first-class providers: a 90% cache-read discount, the deepest of any managed provider. For agent workloads where the system prompt, tool definitions, and knowledge base are stable across hundreds of requests per session, Anthropic frequently delivers the lowest effective input-token cost of any frontier provider - and once cache reads dominate, sometimes below open-source inference.
It is available on both the managed-provider and
BYOK profiles. Requests that resolve here report
Agent-Resolved-Provider: anthropic.
Caching economics
Anthropic caching is explicit: you place cache_control breakpoints at the end of each stable
block (system prompt, tool definitions, long documents). A cache write costs more than a normal token,
so mark only blocks you will actually reuse, and never mark volatile content.
| Fact | Value |
|---|---|
| Cache type | Explicit (cache_control breakpoints) |
| Minimum cacheable block | 1,024 tokens |
| Cache-read discount | 90% on cache reads |
| Default TTL | 300 seconds (5 min) |
| Extended TTL | 3,600 seconds (1h) on Claude 3.5+ |
| Cached-token reporting | Yes |
| Manual cache clear | Not supported |
Tip
Measure cache_creation_input_tokens against cache_read_input_tokens per session to verify your
capture rate. The 90% discount only compounds if reads dominate writes - which is exactly what a
stable prefix across many turns produces.
Batch
The Message Batches API delivers a 50% cost reduction at up to 24h turnaround. Use it for the same
non-interactive classes as everywhere else: background evaluations, diagnostic reprocessing,
pre-computable tool calls, replay experiments. Anthropic exposes a single standard service tier;
there is no flex/scale split to match to QoS.
Retention
Anthropic's data_retention_class is zero_retention_available. That makes it a strong fit for
BYOK when an account-level zero-retention policy has to apply to the inference
calls, and it factors into the purge guarantee the platform can offer for Anthropic-cached state. See
retention and purge.
When the broker routes here
Repeated long-context prompts
Route here when stable prefixes exceed ~1,024 tokens and recurrence is high - the 90% cache-read discount compounds faster than any other provider mechanism.
Stable agent context
A fixed system prompt, tool registry, and knowledge base reused across hundreds of requests is the canonical Anthropic win.
Zero-retention requirements
Workloads that need zero-retention available at the account level, typically under BYOK.
Reasoning-heavy tasks
Extended thinking for reasoning work; gate the added latency behind the right alias class rather than enabling it everywhere.
At a glance
| Capability | Value |
|---|---|
| Context window | 200,000 tokens |
| Multimodal input | Yes |
| Live search | No |
| Dedicated deployment | No |
| Service tiers | standard |
| Data retention | zero_retention_available |
| Regions | us |
Manifest revision cap_2026_06_09. The capability manifest records
that Anthropic holds the deepest cache discount in the set; the broker uses these facts to decide where
a repeated long-context prompt is cheapest.
OpenAI
OpenAI through Zumik - automatic prompt caching at a 50% read discount, the Batch API at 50% off, flex/default/scale service tiers, response retrieval and background mode, and when the broker routes here.
xAI (Grok)
xAI Grok through Zumik - context caching at a 75% read discount, live web-search grounding, Grok-3 and the cost-optimized Grok-3 Mini, no Batch tier, and when the broker routes here.