PERSISTENT STATE NEWS

Cached Thoughts

Our thoughts and updates, cached for later retrieval.

PERSISTENT STATE NEWS

Cached Thoughts

Our thoughts and updates, cached for later retrieval.

All Posts

Announcements

Changelog

Announcement

Mar 24, 2026

New: Automatic Context Window Management Across 17,000+ Models

Backboard now includes Adaptive Context Management, a system that automatically manages conversation state when your application moves between models with different context window sizes.

With access to 17,000+ LLMs on the platform, model switching is common. But context limits vary widely across models. What fits in one model may overflow another.

Until now, developers had to handle that manually.

Adaptive Context Management removes that burden. And it’s included for free with Backboard.

The Problem: Context Windows Are Inconsistent

Different models support different context window sizes. Some allow large conversations. Others are much smaller.

If an application starts a session on a large-context model and later routes a request to a smaller one, the total state can exceed what the new model can handle.

That state typically includes more than just chat messages:

  • system prompts

  • recent conversation turns

  • tool calls and tool responses

  • RAG context

  • web search results

  • runtime metadata

When that information exceeds the model’s limit, something must be removed or compressed.

Most platforms leave this responsibility to developers. That means writing logic for truncation, prioritization, summarization, and overflow handling.

In multi-model systems, that quickly becomes fragile.

Introducing Adaptive Context Management

Backboard now automatically handles context transitions when models change.

When a request is routed to a new model, Backboard dynamically budgets the available context window.

The system works as follows:

  • 20% of the model’s context window is reserved for raw state

  • 80% is freed through intelligent summarization

Backboard first calculates how many tokens fit inside the 20% allocation. Within that space we prioritize the most important live inputs:

  • system prompt

  • recent messages

  • tool calls

  • RAG results

  • web search context

Whatever fits inside this budget is passed directly to the model.

Everything else is compressed.

Intelligent Summarization

When compression is required, Backboard summarizes the remaining conversation automatically.

The summarization pipeline follows a simple rule:

  1. First we attempt summarization using the model the user is switching to.

  2. If the summary still cannot fit within the available context, we fall back to the larger model previously in use to generate a more efficient summary.

This approach preserves the most important information while ensuring the final state fits inside the new model’s limits.

The process happens automatically inside the Backboard runtime.

You Should Rarely Hit 100% Context Again

Because Adaptive Context Management runs continuously during requests and tool calls, the system proactively reshapes the state before a context window is exhausted.

In practice this means your application should rarely reach the full limit of a model’s context window, even when switching models mid conversation.

Backboard keeps the system stable so developers do not need to constantly monitor token overflow.

Developers Can See Exactly What Is Happening

We also expose context usage directly in the msg endpoint so developers can track how their application is using context in real time.

Example response:

"context_usage": {

 "used_tokens": 1302,

 "context_limit": 8191,

 "percent": 19.9,

 "summary_tokens": 0,

 "model": "gpt-4"

}

This makes it easy to monitor:

  • how much context is currently being used

  • how close a request is to the model’s limit

  • how many tokens were generated by summarization

  • which model is currently managing the context

Developers gain visibility without needing to build their own tracking systems.

The Bigger Idea

Backboard was designed so developers can treat models as interchangeable infrastructure.

But that only works if state moves safely with the user.

Adaptive Context Management is another step toward that goal. Applications can move freely across thousands of models while Backboard ensures the conversation state always fits the model being used.

Developers focus on building. Backboard handles the context.

Next Steps

Adaptive Context Management is available today through the Backboard API.

Start building at docs.backboard.io

Changelog

Jan 28, 2026

17,000+ LLMs Now Available on Backboard

We have expanded the Backboard model ecosystem to include over 17,000 large language models, giving independent developers access to one of the largest and most flexible AI model collections available through a single API.

A significant portion of these models come from our partners at Featherless, who specialize in curating and maintaining high-quality open source models for real-world workloads.

What this unlocks for builders

Real choice, not forced defaults
With access to 17,000+ models, you are no longer constrained to a short list of general-purpose LLMs. You can choose models based on cost, speed, size, or specialization, and switch them without reworking your application.

Purpose-built open source models
Many of the newly available models are optimized for narrow tasks such as coding, reasoning, classification, summarization, and domain-specific inference. For independent developers, this often delivers better results with lower latency and cost.

Tool-capable models at scale
Approximately 60 percent of the models support custom tools, including retrieval-augmented generation, search, and external function calls. This enables builders to create agents and workflows that retrieve data, take actions, and reason across systems.

Clear model discoverability
All models that support tool calling are documented in the Backboard model library. You can quickly see which models work with RAG, search, and custom tools before you build.

Experiment freely, optimize continuously
With this breadth of models, experimentation becomes a first-class workflow. You can benchmark models against real use cases, iterate quickly, and evolve your stack as open source models improve.

One API, consistent memory and state
Every model runs through the same Backboard API, with consistent handling of memory, state, routing, and tools. You get flexibility without added complexity.

Example: a lightweight research agent

Consider a simple research agent built by an independent developer:

  • A small, fast open source model from Featherless handles query understanding and routing

  • A tool-capable model performs retrieval over documentation or notes using RAG

  • A separate summarization model produces concise, structured outputs

Because all three models are available behind one Backboard API, the developer can mix and match models without managing multiple SDKs or infrastructure. Memory and state persist across steps, and models can be swapped as better open source options appear, without changing the agent’s architecture.

This kind of setup is often cheaper, faster, and easier to tune than relying on a single large model for every task.

Partner spotlight: Featherless

Featherless provides a deep catalog of specialized open source models, many of which are designed to work seamlessly with tools and external systems. Their focus on practical, task-specific models makes it easier for independent developers to build efficient, modular AI applications.

Get started

  • Explore the model library to find tool-capable models

  • Test specialized open source models for cost and performance gains

  • Build and route across multiple models using a single Backboard integration

This update expands what independent developers can build on Backboard while keeping the experience simple and unified.

Announcement

Feb 19, 2026

Understanding Backboard's AI Ecosystem: State, RAG, and Memory

We get this question a lot and so I thought I'd put together a brief definition and distinction between state, RAG, and memory.

In the rapidly evolving world of AI, understanding the core components that power advanced systems is crucial. At Backboard, we're building on a foundation of sophisticated AI capabilities, and three key concepts are central to our approach: State, RAG (Retrieval-Augmented Generation), and Memory. While these terms are often used in AI discussions, their specific application and integration within Backboard's ecosystem are what set our technology apart.

What is State?

In essence, State refers to the current condition or status of an application or system at any given moment. Think of it as the immediate context. In the realm of AI, this often pertains to the ongoing conversation, the current configuration of an AI agent, or the immediate data it's processing. Our recent launch of Alpha (Stateful API + RAG) in late 2025 underscores Backboard's commitment to effectively managing and utilizing this dynamic state, ensuring our AI can operate with real-time awareness.

What is RAG (Retrieval-Augmented Generation)?

RAG is a powerful technique that significantly enhances the knowledge base of Large Language Models (LLMs). It works by allowing an LLM to retrieve relevant information from an external data source before it generates a response. This is critical because it enables our LLMs to access and incorporate up-to-date, domain-specific, or proprietary information that they weren't originally trained on. For Backboard, integrating RAG means our AI can provide more accurate, relevant, and contextually aware outputs, drawing from the most pertinent information available.

What is Memory?

Memory is a broader and more encompassing concept than RAG. In AI, Memory refers to a system's ability to store, process, and recall past information, interactions, or experiences. This capability is fundamental for enabling:

  • Conversational Continuity: Remembering previous turns in a dialogue.

  • Personalized Interactions: Tailoring responses based on past user preferences or behaviors.

  • Learning Over Time: Improving performance and understanding through accumulated experience.


Backboard's strategic roadmap prominently features advancements in Memory, with the planned releases of Portable Memory in October 2025 and Infinite Memory in December 2025. These initiatives highlight our dedication to developing sophisticated memory systems that allow our AI to learn, adapt, and retain context over extended periods.

The Interplay: How They Differ and Work Together

While RAG, State, and Memory are distinct, they are deeply interconnected and essential for building intelligent AI systems:

  • RAG is a specific method for enriching an LLM's immediate response by accessing external data.

  • Memory is a more comprehensive system for preserving and recalling past information, enabling long-term context and learning.

  • State describes the current condition of the system at any given point in time, which is influenced by both RAG's retrieval and Memory's recall.

Backboard leverages the synergy between these components. RAG provides immediate, relevant data, while Memory ensures that the AI understands the ongoing context and can recall past interactions. The State of the system is continuously updated by these processes, allowing Backboard's AI to be both knowledgeable in the moment and contextually aware over time.

By mastering the interplay of State, RAG, and Memory, Backboard is building AI that is not only intelligent but also deeply understanding and continuously learning. This forms the backbone of our mission to deliver unparalleled AI solutions.

All Posts

Announcement

Mar 24, 2026

New: Automatic Context Window Management Across 17,000+ Models

Backboard now includes Adaptive Context Management, a system that automatically manages conversation state when your application moves between models with different context window sizes.

With access to 17,000+ LLMs on the platform, model switching is common. But context limits vary widely across models. What fits in one model may overflow another.

Until now, developers had to handle that manually.

Adaptive Context Management removes that burden. And it’s included for free with Backboard.

The Problem: Context Windows Are Inconsistent

Different models support different context window sizes. Some allow large conversations. Others are much smaller.

If an application starts a session on a large-context model and later routes a request to a smaller one, the total state can exceed what the new model can handle.

That state typically includes more than just chat messages:

  • system prompts

  • recent conversation turns

  • tool calls and tool responses

  • RAG context

  • web search results

  • runtime metadata

When that information exceeds the model’s limit, something must be removed or compressed.

Most platforms leave this responsibility to developers. That means writing logic for truncation, prioritization, summarization, and overflow handling.

In multi-model systems, that quickly becomes fragile.

Introducing Adaptive Context Management

Backboard now automatically handles context transitions when models change.

When a request is routed to a new model, Backboard dynamically budgets the available context window.

The system works as follows:

  • 20% of the model’s context window is reserved for raw state

  • 80% is freed through intelligent summarization

Backboard first calculates how many tokens fit inside the 20% allocation. Within that space we prioritize the most important live inputs:

  • system prompt

  • recent messages

  • tool calls

  • RAG results

  • web search context

Whatever fits inside this budget is passed directly to the model.

Everything else is compressed.

Intelligent Summarization

When compression is required, Backboard summarizes the remaining conversation automatically.

The summarization pipeline follows a simple rule:

  1. First we attempt summarization using the model the user is switching to.

  2. If the summary still cannot fit within the available context, we fall back to the larger model previously in use to generate a more efficient summary.

This approach preserves the most important information while ensuring the final state fits inside the new model’s limits.

The process happens automatically inside the Backboard runtime.

You Should Rarely Hit 100% Context Again

Because Adaptive Context Management runs continuously during requests and tool calls, the system proactively reshapes the state before a context window is exhausted.

In practice this means your application should rarely reach the full limit of a model’s context window, even when switching models mid conversation.

Backboard keeps the system stable so developers do not need to constantly monitor token overflow.

Developers Can See Exactly What Is Happening

We also expose context usage directly in the msg endpoint so developers can track how their application is using context in real time.

Example response:

"context_usage": {

 "used_tokens": 1302,

 "context_limit": 8191,

 "percent": 19.9,

 "summary_tokens": 0,

 "model": "gpt-4"

}

This makes it easy to monitor:

  • how much context is currently being used

  • how close a request is to the model’s limit

  • how many tokens were generated by summarization

  • which model is currently managing the context

Developers gain visibility without needing to build their own tracking systems.

The Bigger Idea

Backboard was designed so developers can treat models as interchangeable infrastructure.

But that only works if state moves safely with the user.

Adaptive Context Management is another step toward that goal. Applications can move freely across thousands of models while Backboard ensures the conversation state always fits the model being used.

Developers focus on building. Backboard handles the context.

Next Steps

Adaptive Context Management is available today through the Backboard API.

Start building at docs.backboard.io

Changelog

Jan 28, 2026

17,000+ LLMs Now Available on Backboard

We have expanded the Backboard model ecosystem to include over 17,000 large language models, giving independent developers access to one of the largest and most flexible AI model collections available through a single API.

A significant portion of these models come from our partners at Featherless, who specialize in curating and maintaining high-quality open source models for real-world workloads.

What this unlocks for builders

Real choice, not forced defaults
With access to 17,000+ models, you are no longer constrained to a short list of general-purpose LLMs. You can choose models based on cost, speed, size, or specialization, and switch them without reworking your application.

Purpose-built open source models
Many of the newly available models are optimized for narrow tasks such as coding, reasoning, classification, summarization, and domain-specific inference. For independent developers, this often delivers better results with lower latency and cost.

Tool-capable models at scale
Approximately 60 percent of the models support custom tools, including retrieval-augmented generation, search, and external function calls. This enables builders to create agents and workflows that retrieve data, take actions, and reason across systems.

Clear model discoverability
All models that support tool calling are documented in the Backboard model library. You can quickly see which models work with RAG, search, and custom tools before you build.

Experiment freely, optimize continuously
With this breadth of models, experimentation becomes a first-class workflow. You can benchmark models against real use cases, iterate quickly, and evolve your stack as open source models improve.

One API, consistent memory and state
Every model runs through the same Backboard API, with consistent handling of memory, state, routing, and tools. You get flexibility without added complexity.

Example: a lightweight research agent

Consider a simple research agent built by an independent developer:

  • A small, fast open source model from Featherless handles query understanding and routing

  • A tool-capable model performs retrieval over documentation or notes using RAG

  • A separate summarization model produces concise, structured outputs

Because all three models are available behind one Backboard API, the developer can mix and match models without managing multiple SDKs or infrastructure. Memory and state persist across steps, and models can be swapped as better open source options appear, without changing the agent’s architecture.

This kind of setup is often cheaper, faster, and easier to tune than relying on a single large model for every task.

Partner spotlight: Featherless

Featherless provides a deep catalog of specialized open source models, many of which are designed to work seamlessly with tools and external systems. Their focus on practical, task-specific models makes it easier for independent developers to build efficient, modular AI applications.

Get started

  • Explore the model library to find tool-capable models

  • Test specialized open source models for cost and performance gains

  • Build and route across multiple models using a single Backboard integration

This update expands what independent developers can build on Backboard while keeping the experience simple and unified.

Announcement

Feb 19, 2026

Understanding Backboard's AI Ecosystem: State, RAG, and Memory

We get this question a lot and so I thought I'd put together a brief definition and distinction between state, RAG, and memory.

In the rapidly evolving world of AI, understanding the core components that power advanced systems is crucial. At Backboard, we're building on a foundation of sophisticated AI capabilities, and three key concepts are central to our approach: State, RAG (Retrieval-Augmented Generation), and Memory. While these terms are often used in AI discussions, their specific application and integration within Backboard's ecosystem are what set our technology apart.

What is State?

In essence, State refers to the current condition or status of an application or system at any given moment. Think of it as the immediate context. In the realm of AI, this often pertains to the ongoing conversation, the current configuration of an AI agent, or the immediate data it's processing. Our recent launch of Alpha (Stateful API + RAG) in late 2025 underscores Backboard's commitment to effectively managing and utilizing this dynamic state, ensuring our AI can operate with real-time awareness.

What is RAG (Retrieval-Augmented Generation)?

RAG is a powerful technique that significantly enhances the knowledge base of Large Language Models (LLMs). It works by allowing an LLM to retrieve relevant information from an external data source before it generates a response. This is critical because it enables our LLMs to access and incorporate up-to-date, domain-specific, or proprietary information that they weren't originally trained on. For Backboard, integrating RAG means our AI can provide more accurate, relevant, and contextually aware outputs, drawing from the most pertinent information available.

What is Memory?

Memory is a broader and more encompassing concept than RAG. In AI, Memory refers to a system's ability to store, process, and recall past information, interactions, or experiences. This capability is fundamental for enabling:

  • Conversational Continuity: Remembering previous turns in a dialogue.

  • Personalized Interactions: Tailoring responses based on past user preferences or behaviors.

  • Learning Over Time: Improving performance and understanding through accumulated experience.


Backboard's strategic roadmap prominently features advancements in Memory, with the planned releases of Portable Memory in October 2025 and Infinite Memory in December 2025. These initiatives highlight our dedication to developing sophisticated memory systems that allow our AI to learn, adapt, and retain context over extended periods.

The Interplay: How They Differ and Work Together

While RAG, State, and Memory are distinct, they are deeply interconnected and essential for building intelligent AI systems:

  • RAG is a specific method for enriching an LLM's immediate response by accessing external data.

  • Memory is a more comprehensive system for preserving and recalling past information, enabling long-term context and learning.

  • State describes the current condition of the system at any given point in time, which is influenced by both RAG's retrieval and Memory's recall.

Backboard leverages the synergy between these components. RAG provides immediate, relevant data, while Memory ensures that the AI understands the ongoing context and can recall past interactions. The State of the system is continuously updated by these processes, allowing Backboard's AI to be both knowledgeable in the moment and contextually aware over time.

By mastering the interplay of State, RAG, and Memory, Backboard is building AI that is not only intelligent but also deeply understanding and continuously learning. This forms the backbone of our mission to deliver unparalleled AI solutions.