Model aliases
Request a capability instead of a model string. Immutable alias releases, deterministic resolution, per-request resolution records, and no silent drift.
A model alias lets you request a capability - auto, code.best, auto.fast - instead of pinning a specific provider model. The platform resolves the alias to a concrete target at request time, applying current routing policy and capability manifests. The catch is that an alias is only useful if it is reproducible, and that is what the rest of this page is about.
The primary aliases
These are powered by the cost-aware router: each alias is an operating point on one trained policy - a candidate subset plus a quality bar - so it routes every request to the cheapest model that will do that kind of job well, escalating when it won't.
| Alias | Intent |
|---|---|
auto | The default. Cheapest model that will do the job well. |
auto.best | Quality-first; escalates to the strongest model readily. |
auto.fast | Cheapest, smallest acceptable model for simple work. |
code.best | Hard coding and agentic work; strong coding models only. |
code.fast | Quick edits and short coding tasks; small, fast coders. |
reasoning.best | Deep reasoning; the strongest models. |
vision.balanced | Multimodal tasks (routed to a managed multimodal provider). |
Older names (auto.balanced, auto.cheapest, code.balanced, code.cheapest, auto.cost) keep working as synonyms, now served by the same cost-aware router. auto.semantic remains available as a heuristic, no-model-call alternative.
Immutable alias releases
A live alias does not point at a model. It points at exactly one immutable alias release: a frozen resolution policy with a fixed set of weighted targets.
{
"id": "alr_2026_06_09_003",
"object": "model_alias_release",
"alias": "code.fast",
"status": "active",
"policy_revision": "policy_7",
"capability_manifest_revision": "cap_2026_06_09",
"targets": [
{ "execution_profile": "managed_provider", "provider": "openai",
"model": "gpt-4o", "model_revision": "2025-01-01", "weight": 60 },
{ "execution_profile": "managed_provider", "provider": "fireworks_ai",
"model": "llama-v4-maverick", "model_revision": "2025-06-01", "weight": 40 }
]
}Releases are immutable. A provider-model revision change creates a new release; it never edits an existing one. A "frozen" alias never drifts because there is nothing mutable to drift.
Resolution is deterministic, not random
When a release has weighted targets, resolution is weighted but deterministic. The platform uses a stable per-request seed - for example a hash of the response ID - to pick a target, so the same request always resolves the same way. This is what makes a replay faithful: re-running the request reproduces the routing decision exactly, rather than rolling the dice again.
In the example above, the seed space splits 60/40 between OpenAI and Fireworks. A given seed always lands in the same partition.
Semantic routing
A release can route by request signal instead of by weight. A semantic release tags each target with a tier (cheap, balanced, reasoning, vision); at request time the platform classifies the prompt - size, code presence, reasoning intent, vision, and conservative prompt-injection markers - and a policy maps those signals to a tier. Classification is heuristic and deterministic (no extra model call), so a semantic decision is just as reproducible and auditable as a weighted one.
The seeded auto.semantic alias uses the built-in policy: a simple question routes to the cheap tier, a request to prove or optimize routes to the reasoning tier, a vision request to the vision tier, and a strong, multi-marker prompt-injection attempt is blocked with a request_blocked (403) error. The block threshold is deliberately high so ordinary security, debugging, and CTF work is never refused on a heuristic; a moderate safety signal routes to a stronger model rather than denying.
{
"requested_model": "auto.semantic",
"resolved_provider": "anthropic",
"resolved_model": "claude-3-7-sonnet",
"resolution_reason": "semantic_routing:reasoning_intent"
}The resolution_reason records which rule fired, so a semantic decision traces back to the signal that caused it.
Resolution records
Every request records exactly how its alias resolved, so routing is auditable:
{
"requested_model": "code.fast",
"alias_release_id": "alr_2026_06_09_003",
"resolved_execution_profile": "managed_provider",
"resolved_provider": "openai",
"resolved_model": "gpt-4o",
"resolved_model_revision": "2025-01-01",
"capability_manifest_revision": "cap_2026_06_09",
"resolution_reason": "lowest_expected_latency_under_policy"
}The alias_release_id appears in your logs, so you can always trace a response back to the precise policy that routed it.
No silent drift
These rules together eliminate the failure mode where a model quietly changes underneath you:
A live alias points to one immutable release
The release is the source of truth for how the alias resolves right now.
A provider revision change forces a new release
You can see the new release ID; nothing changes invisibly.
Replays pin a release or a concrete target
A replay re-runs against the exact release it recorded, not against "whatever the alias means today."
/v1/models returns the small, exact upstream object shape (id, object, created, owned_by). Rich alias metadata - releases, targets, capability revisions - lives on /v2/model-aliases, in keeping with the API surface extension rule.
Workload Reuse Score
The WRS formula, its six weighted components, the interpretation bands, and the deliberately separate deployment-readiness score.
Agent Hints
A vendor-neutral, versioned contract for expressing intent - reuse, QoS, routing, retry safety - without exposing any provider- or engine-specific knob.