LLM routing

Persist State Across 17,000+ Models

Backboard gives you a single, portable API to 17,000+ LLMs across providers. Bring your own keys from OpenAI, Anthropic, Google Gemini, Cohere, xAI, OpenRouter, and more. Route by cost, speed, quality, or capability—with built‑in state management and ADAPTIVE CONTEXT MANAGEMENT, no token markup, and access to many free models.

SSttaarrtt BBuuiillddiinngg

VViieeww DDooccss

LLM routing

Persist State Across 17,000+ Models

SSttaarrtt BBuuiillddiinngg

VViieeww DDooccss

LLM routing, without the glue code

What is LLM routing on Backboard?

Backboard lets you call 17,000+ models from a single endpoint and change which model you use at any time, without rewriting your app. Instead of hard‑coding every provider's SDK and payload quirks, you:

Integrate one unified API

Replace every provider SDK with a single OpenAI-compatible endpoint. One integration unlocks OpenAI, Anthropic, Google Gemini, Cohere, xAI, and thousands more — no rewrites needed when you switch models.

Integrate one unified API

Choose models with a simple string or routing rule

Pass a model name like "openai/gpt-4o" or a routing rule like "fastest" or "cheapest". Swap models in a config change without touching your app logic.

Choose models with a simple string or routing rule

Pass a model name like "openai/gpt-4o" or a routing rule like "fastest" or "cheapest". Swap models in a config change without touching your app logic.

Bring your own keys (BYOK) for providers you already use

Connect your existing API keys from OpenAI, Anthropic, Google, and other providers. Pay those providers directly at their listed rates — Backboard adds zero markup on tokens.

Bring your own keys (BYOK) for providers you already use

Connect your existing API keys from OpenAI, Anthropic, Google, and other providers. Pay those providers directly at their listed rates — Backboard adds zero markup on tokens.

Let Backboard handle state, context, tools, and memory consistently

Backboard automatically persists conversation state, manages context windows, retrieves memory, and runs tools — consistently, regardless of which model is handling the request.

Let Backboard handle state, context, tools, and memory consistently

Backboard automatically persists conversation state, manages context windows, retrieves memory, and runs tools — consistently, regardless of which model is handling the request.

LLM ROUTING

Why engineers route through Backboard

From benchmark-leading memory to BYOK with no token markup — everything built into one stateful API.

One API, 17,000+ models, many free

Call 17,000+ models from OpenAI, Anthropic, Google, Mistral, Cohere, xAI, and more through one unified endpoint. Hundreds of free models available for experimentation and background tasks.

One API, 17,000+ models, many free

Call 17,000+ models from OpenAI, Anthropic, Google, Mistral, Cohere, xAI, and more through one unified endpoint. Hundreds of free models available for experimentation and background tasks.

BYOK with no token markup

Connect your own API keys from any major provider and pay them directly at their published rates. Backboard adds zero token markup — ever.

BYOK with no token markup

Connect your own API keys from any major provider and pay them directly at their published rates. Backboard adds zero token markup — ever.

Stateful by default

Every request is context-aware out of the box. Conversation state and session memory are handled automatically so you don't have to pass history manually on every call.

Stateful by default

Every request is context-aware out of the box. Conversation state and session memory are handled automatically so you don't have to pass history manually on every call.

Adaptive context management built in

Backboard intelligently trims, summarizes, and prioritizes context to fit within any model's token window — preserving the most relevant information without manual tuning.

Adaptive context management built in

Backboard intelligently trims, summarizes, and prioritizes context to fit within any model's token window — preserving the most relevant information without manual tuning.

Configurable memory, RAG, and tools on every route

Attach memory (lite or pro), RAG retrieval, and tool integrations to any route. Configure them once and they follow your requests across every model you route to.

Configurable memory, RAG, and tools on every route

Attach memory (lite or pro), RAG retrieval, and tool integrations to any route. Configure them once and they follow your requests across every model you route to.

Model Independent Web Search

Built-in real-time web search works across all 17,000+ models — no extra integration required. Available on every plan at no additional cost.

Model Independent Web Search

Built-in real-time web search works across all 17,000+ models — no extra integration required. Available on every plan at no additional cost.

how it works

How LLM routing works

You call a single msg‑style endpoint and pass the model (or routing rule), the state or conversation ID, and optional tools: memory, RAG, web search, custom tools.

1. Resolve model

Backboard maps your model string or routing rule to the right provider and endpoint — whether that's a named model like "anthropic/claude-3-7-sonnet" or a policy like "cheapest with vision".

1. Resolve model

Backboard maps your model string or routing rule to the right provider and endpoint — whether that's a named model like "anthropic/claude-3-7-sonnet" or a policy like "cheapest with vision".

2. Apply state

Your session ID is used to load the relevant conversation history, memory entries, and tool state — so the chosen model receives full context without you passing it manually.

2. Apply state

Your session ID is used to load the relevant conversation history, memory entries, and tool state — so the chosen model receives full context without you passing it manually.

3. Fit context

Adaptive Context Management trims, summarizes, and prioritizes your loaded context to fit within the selected model's token window before the request is sent.

3. Fit context

Adaptive Context Management trims, summarizes, and prioritizes your loaded context to fit within the selected model's token window before the request is sent.

4. Run and return

The request is forwarded to the provider using your own API key (if BYOK), streamed back through Backboard, and the updated state is persisted for the next turn.

4. Run and return

The request is forwarded to the provider using your own API key (if BYOK), streamed back through Backboard, and the updated state is persisted for the next turn.

1. Resolve model

Backboard maps your model string or routing rule to the right provider and endpoint — whether that's a named model like "anthropic/claude-3-7-sonnet" or a policy like "cheapest with vision".

3. Fit context

Adaptive Context Management trims, summarizes, and prioritizes your loaded context to fit within the selected model's token window before the request is sent.

2. Apply state

Your session ID is used to load the relevant conversation history, memory entries, and tool state — so the chosen model receives full context without you passing it manually.

4. Run and return

The request is forwarded to the provider using your own API key (if BYOK), streamed back through Backboard, and the updated state is persisted for the next turn.

routing patterns

Routing patterns you can implement

Same state, same memory, same tools—different models for different jobs.

Cost‑aware routing

Route simple or repetitive queries to cheaper, faster models and reserve expensive reasoning models for tasks that genuinely need them — automatically reducing cost without sacrificing quality.

Cost‑aware routing

Route simple or repetitive queries to cheaper, faster models and reserve expensive reasoning models for tasks that genuinely need them — automatically reducing cost without sacrificing quality.

Latency‑sensitive routing

Direct time-critical requests to the fastest available model for a given capability. Ideal for real-time chat, autocomplete, or user-facing features where response speed matters.

Latency‑sensitive routing

Direct time-critical requests to the fastest available model for a given capability. Ideal for real-time chat, autocomplete, or user-facing features where response speed matters.

Capability‑based routing

Route by what a model is best at — vision, code generation, long context, multilingual, or function calling. Match task type to the model most likely to get it right.

Capability‑based routing

Route by what a model is best at — vision, code generation, long context, multilingual, or function calling. Match task type to the model most likely to get it right.

Provider redundancy

Automatically failover to an alternate provider if your primary model is rate-limited or unavailable. Keep your app running without manual intervention or downtime.

Provider redundancy

Automatically failover to an alternate provider if your primary model is rate-limited or unavailable. Keep your app running without manual intervention or downtime.

how it works