Zumik
Providers

Google Gemini

Gemini through Zumik - both implicit and explicit caching at up to a 75% discount, the largest context window in the set at 1M+ tokens, first-class multimodal input, the Batch API at 50% off, manual cache clearing, and when the broker routes here.

Google Gemini is the long-context, multimodal, low-friction option. It is the only first-class provider that does both implicit and explicit caching, it has the largest context window in the set at over 1M tokens, and it is the only one that supports manual cache clearing. For any workload with long repeated prompts and no hard requirement for a specific frontier model, Gemini implicit caching plus long context often delivers the best cost-per-useful-output of the managed providers.

It is available on both the managed-provider and BYOK profiles. Requests that resolve here report Agent-Resolved-Provider: gemini.

Caching economics

Gemini gives you two mechanisms, and you can use either or both:

  • Implicit caching is automatic for requests whose prefix (1,024+ tokens) matches a recent request, up to a 75% discount with zero client-side instrumentation. This is the lowest-friction cost optimization across all providers - no breakpoints, no cache IDs.
  • Explicit caching (the Context Caching API) manually caches a content block - a document, an instruction set, a tool list - referenced by cache ID in later requests, billed by cache storage duration. Use it when the same large block is referenced across hundreds of requests per day.
FactValue
Cache typeBoth (implicit + explicit)
Minimum cacheable prefix1,024 tokens
Cache-read discountup to 75%
Default TTL3,600 seconds (1h)
Extended TTL86,400 seconds (24h)
Cached-token reportingYes
Manual cache clearSupported

Gemini is the one first-class provider here that reports manual_cache_clear_supported. That lets a managed-provider purge reach a stronger guarantee for Gemini-cached state than the expiry-bound best effort other managed providers are capped at. See retention and purge.

Batch, long context, and multimodal

The Batch API delivers a 50% cost reduction at up to 24h turnaround for bulk inference with async result delivery. The 1M+ token context window lets a task load a full document and skip a RAG retrieval step entirely. And Gemini has first-class vision, audio, video, and code input - the broker routes multimodal workloads here by default.

When the broker routes here

Long repeated prompts

Workloads with prompts of 1,024+ tokens that do not need explicit cache management - implicit caching delivers up to 75% off automatically.

Very long context

Loading a whole document into the 1M+ window to avoid building a retrieval pipeline.

Multimodal input

Vision, audio, video, and code input route here by default.

Large batch jobs

Bulk async inference at the 50% Batch discount.

At a glance

CapabilityValue
Context window1,048,576 tokens
Multimodal inputYes
Live searchNo
Dedicated deploymentNo
Service tiersstandard
Data retentionstandard
Regionsus, global

Manifest revision cap_2026_06_09. The capability manifest records that Gemini holds the largest context window and the only manual cache clear in the set, both of which shape where the broker sends long-context and multimodal work.

On this page