xAI (Grok)
xAI Grok through Zumik - context caching at a 75% read discount, live web-search grounding, Grok-3 and the cost-optimized Grok-3 Mini, no Batch tier, and when the broker routes here.
xAI Grok is the provider to reach for when a request needs live web-search grounding or a
fast, cost-optimized frontier response. Grok-3 covers frontier reasoning and live-web-grounded tasks;
Grok-3 Mini is the cost-sensitive routing target for high-volume work where full Grok-3 capability is
not required. It is the only first-class provider here that reports live_search_supported.
It is available on both the managed-provider and
BYOK profiles. Requests that resolve here report Agent-Resolved-Provider: xai.
Caching economics
xAI caches a stable context across consecutive requests, billed at a 75% read discount. Monitor the cache hit rate via the usage metadata, and keep the stable prefix consistent across calls so the cached context stays warm.
| Fact | Value |
|---|---|
| Cache type | Explicit (cached context) |
| Minimum cacheable prefix | 1,024 tokens |
| Cache-read discount | 75% |
| Default TTL | 300 seconds (5 min) |
| Extended TTL | 3,600 seconds (1h) |
| Cached-token reporting | Yes |
| Manual cache clear | Not supported |
No Batch tier
xAI has no Batch API (batch_api_supported is false). Route background and batch-class work to a
provider that does have one - OpenAI,
Anthropic, or Gemini - and keep xAI
for the interactive and live-grounded traffic it is best at. xAI exposes a single standard service
tier.
Live search
Grok supports real-time web-search grounding, which lets a request reach for up-to-date information
without a separate retrieval pipeline. This is gated behind the live_search_supported manifest flag,
so the broker only routes grounding-dependent work here.
When the broker routes here
Live-grounded reasoning
Tasks that need current web information inline, where a Perplexity-style grounding step would otherwise be required.
Cost-sensitive frontier
Grok-3 Mini for high-volume interactive work where a frontier model is needed but cost per token matters more than maximum capability.
Fast lightweight responses
Grok-3 Mini is competitive with commodity small frontier models on time-to-first-token.
Warm stable prefixes
Long stable contexts reused across consecutive calls take the 75% cached-context discount.
At a glance
| Capability | Value |
|---|---|
| Context window | 131,072 tokens |
| Multimodal input | Yes |
| Live search | Yes |
| Batch API | No |
| Dedicated deployment | No |
| Service tiers | standard |
| Data retention | standard |
| Regions | us |
Manifest revision cap_2026_06_09. The capability manifest is what
tells the broker xAI is the only first-class option for live-grounded work and that background work
must route elsewhere.
Anthropic
Anthropic through Zumik - the deepest cache-read discount of any managed provider at 90%, explicit cache_control breakpoints, the Message Batches API at 50% off, zero-retention availability, and when the broker routes here.
Google Gemini
Gemini through Zumik - both implicit and explicit caching at up to a 75% discount, the largest context window in the set at 1M+ tokens, first-class multimodal input, the Batch API at 50% off, manual cache clearing, and when the broker routes here.