PERSISTENT STATE NEWS

PERSISTENT STATE NEWS

Cached Thoughts

Our thoughts and updates, cached for later retrieval.

PERSISTENT STATE NEWS

Cached Thoughts

Our thoughts and updates, cached for later retrieval.

All Posts

Announcements

Changelog

Announcement

Nov 16, 2025

What is AI Memory, Really?

AI memory is the system that allows models to preserve information over time. It includes parametric memory inside the model weights and non-parametric memory stored outside the model through tools like databases, embeddings, and state layers. Memory is still one of the hardest unsolved problems in AI, and most approaches fail at scale. Backboard treats memory as a configurable infrastructure layer designed for accuracy, persistence, and cross-model continuity.

Why Memory Matters

Models are strong at generating answers but weak at remembering. When they forget context, users repeat themselves, workflows break, and trust drops. Even models with 1M-token windows cannot reliably maintain long-term context. Research from Anthropic and OpenAI shows accuracy decay as conversations grow due to approximate attention and compression limits. Larger windows help, but they do not replace persistent memory.

Parametric vs Non-Parametric Memory

Parametric Memory

• Stored inside model weights
• Learned during training
• Static and difficult to update
• Covers general knowledge but not personal or session-specific information

Non-Parametric Memory

• Stored outside the model
• Dynamic, persistent, controllable
• Includes transcripts, embeddings, session data, threads, and structured state
• Powers RAG, agent architectures, and context managers

Non-parametric systems tend to fail when scaling to millions of tokens, especially when retrieval is inconsistent or state management is improvised.

Why Memory Is Hard

Scale: Storing data is cheap. Finding the correct slice is not.
Retrieval: Semantic search fails if queries do not match embedding characteristics.
State Management: Most agents collapse under long histories due to drift and noisy context injection.
Privacy: Scattered storage across tools creates compliance issues.

How Backboard Solves the Problem

Backboard is built around a principle: memory should behave like a reliable, configurable database for AI.

Stateful Threads

Each conversation or agent runs inside a thread with persistent continuity. Developers get stable long-term context without manual stitching.

Portable Memory

Memory follows the user across 2,200+ models. This eliminates vendor lock-in and enables optimal routing.

Persistent Storage With High Recall

Everything is stored unless configured otherwise. Retrieval accuracy remains high thanks to configurable embedding models, vector DBs, and dimensions. Backboard currently holds the world’s highest validated LoCoMo score for long-context memory.

Configurability

Memory can be tuned per use case.
Examples:
• strict recall vs broader semantic recall
• selective write rules
• custom embedding models and storage
• fine control of context injection

Production Reliability

Backboard includes a unified API, privacy controls, anonymization, and reproducible benchmarks. It removes the need for custom glue code.

How Backboard Compares to Other Approaches

RAG

RAG is good for document lookup, not long-term memory.
• Strength: factual retrieval from known sources
• Weakness: poor with unstructured conversational history, drift, and personal context
Backboard can use RAG components, but it layers stateful threads and persistent memory on top to maintain continuity across tasks.

MemGPT

MemGPT introduced the idea of hierarchical memory with a scratchpad and long-term store.
• Strength: creative architecture for dynamic memory management
• Weakness: heavy prompting logic, custom reasoning loops, difficult to operationalize
Backboard takes the same core idea but delivers it as an API with configurable memory, multiple storage options, and cross-model portability.

Letta

Letta focuses on agent state, tool usage, and planning.
• Strength: strong agent workflows and tool orchestration
• Weakness: less focused on massive-scale, multi-model long-term memory
Backboard complements Letta by supplying a high-accuracy, persistent memory layer that agents can read from and write to.

In short:
• RAG retrieves facts
• MemGPT structures agent memory
• Letta orchestrates agent behavior
• Backboard provides the reliable long-term memory that each of them needs

Why This Matters for the Future

Systems that remember will outperform systems that reset their context every time. Long-term continuity becomes the differentiator for personal assistants, business agents, and enterprise workflows. Memory is not a feature. It is infrastructure.

Next Steps

Explore the LoCoMo benchmark
Review API docs for memory threads
Sign Up!

Changelog

Nov 13, 2025

AWS Bedrock + Enhanced File Upload

We have expanded our Model Library with full support for AWS Bedrock. Developers can now access a broad set of high performance models through a single Backboard API, with unified memory, configuration, and monitoring.

Anthropic Claude Models

  • Claude 4.5 Haiku
    Fast and efficient with a 200K context window.

  • Claude 4.5 Sonnet
    Strong reasoning performance with prompt caching support.

  • Claude 4 Sonnet and Opus
    Premium models with 200K context windows.

  • Claude 3.x Series
    Includes Sonnet, Haiku, and legacy versions for backward compatibility.

Meta Llama Models

  • Llama 4 Maverick
    One million token context window with tool calling enabled.

  • Llama 4 Scout
    128K context and tool support.

  • Llama 3.3 (70B)
    High performing 70B parameter model.

  • Llama 3.2 Series
    Models ranging from 1B to 90B parameters.

  • Llama 3.1 Series
    8B and 70B instruct tuned models.

Other Providers

  • DeepSeek R1
    Advanced reasoning capabilities for research style workloads.

  • Mistral Pixtral Large
    Multimodal model with strong performance across vision and text tasks.

Enhanced File Upload API

We have added stronger guardrails across assistants, threads, and messages to help developers stay within resource limits without guesswork. Error messages now include detailed usage to make troubleshooting straightforward.

Assistant Level

  • File Limit: 20 files

  • Token Limit: 5,000,000 tokens per file

  • File Size: Maximum 200 MB

Thread Level

  • File Limit: 20 files per thread

  • Combined Limit: 140 total files across assistants, threads, and attachments

Message Attachments

  • Per Message: Up to 10 attachments

  • Per Thread: Up to 100 attachments

  • Token Limit: 1,000,000 tokens per attachment

  • File Size: Maximum 10 MB

Message Input Limits

  • Character Limit: 200,000 characters per message

  • Token Limit: 50,000 tokens per message

These improvements create more predictable development workflows while still giving teams a wide range of flexibility. We will continue pushing these limits upward over time as part of our long term effort to deliver near infinite capacity.

Changelog

Nov 10, 2025

Embedding Models Now Available in Model Library

We’re excited to announce that embedding models are now fully integrated into the Backboard Model Library. This update expands the power and flexibility of your AI stack—giving developers direct access to embeddings across multiple providers, right alongside LLMs.

What’s New

You can now:

  • Browse 12 embedding models from OpenAI, Google, and Cohere in the Model Library

  • Filter by provider, dimensions, and model type

  • Use embedding models directly when creating or configuring assistants

New API Endpoints

Developers can programmatically access embedding models using the following endpoints:

GET /api/models/embedding/all           # List all embedding models
GET /api/models/embedding/{model_name}  # Get details for a specific embedding model
GET /api/models/embedding/providers     # List available embedding providers
GET /api/models?model_type=embedding    # Filter models by type

Available Models

OpenAI (3 models)

  • text-embedding-3-large (3072 dims)

  • text-embedding-3-small (1536 dims)

  • text-embedding-ada-002 (1536 dims)

Google (3 models)

  • gemini-embedding-001-768

  • gemini-embedding-001-1536

  • gemini-embedding-001-3072

Cohere (6 models)

  • embed-v4.0 (256, 512, 1024, 1536 dims)

  • embed-english-v3.0

  • embed-multilingual-v3.0

How to Use

When creating an assistant, you can now specify embedding parameters directly:

{
  "name": "My Assistant",
  "embedding_provider": "openai",
  "embedding_model_name": "text-embedding-3-large"
}

The selected model must exist in the Model Library.

Compatibility

All updates are fully backward compatible—existing integrations and assistants continue to work without modification.

Embedding models open up new possibilities for retrieval, classification, search, and RAG workflows inside Backboard.

For support or questions, contact the Backboard team or visit backboard.io/docs.

Announcement

Nov 9, 2025

AI Memory vs RAG: What’s the Difference?

Every few weeks, someone on a dev forum asks, “Isn’t AI memory just RAG?”
It’s an understandable question. Both systems help models “remember” information but they do it in completely different ways.

Understanding the difference isn’t just semantics. It determines how your AI behaves, scales, and how close it gets to real contextual intelligence.

The Short Answer

RAG (Retrieval-Augmented Generation) is about looking up information.
AI memory is about retaining and evolving it.

What RAG Actually Does

RAG connects a model to an external knowledge source like a vector database. When a user asks a question, the system retrieves the most relevant context chunks, passes them to the model, and generates a response.

RAG excels at:

  • Knowledge retrieval: Answering questions from documents, PDFs, or web sources.

  • Reducing hallucinations: Providing evidence-based grounding.

  • Scalability: You can update or swap the data store without retraining the model.

But RAG doesn’t remember anything. Each query is stateless. If you close the chat, the system forgets you ever existed.

What AI Memory Actually Does

AI memory adds a temporal layer—it lets the model accumulate and recall experiences across interactions. Instead of pulling static documents, it reconstructs contextual continuity:

  • User context: remembers preferences, facts, tone, and history.

  • Environmental context: understands where, when, and how interactions happen.

  • Evolving knowledge: stores model-generated insights over time.

Where RAG looks outward for static truth, AI memory looks inward for continuity and learning. It’s how systems begin to behave like never-ending databases of experience, not just retrieval machines.

Why Developers Confuse Them

Many frameworks blur the line by calling vector stores “memory.” But a true contextual memory system does more than retrieve embeddings it writes, organizes, and retrieves context dynamically.

In a proper architecture, RAG and memory work together:

  • RAG augments knowledge with external retrieval.

  • Memory preserves conversational or operational continuity.

You could think of it like this:

RAG finds facts.
Memory builds relationships.

When to Use Each

Use RAG when:

  • You need to access large, changing knowledge bases.

  • You’re answering factual or reference-heavy queries.

  • Your system doesn’t need long-term personalization.

Use AI memory when:

  • You need stateful or evolving context (like ongoing conversations or agents).

  • You’re building assistants that should “learn” over time.

  • You want models that retain user identity, task history, and preferences.

And in most real systems? You’ll want both.

How Backboard Approaches It

At Backboard, we treat memory as an architecture, not a feature.
Our API allows developers to configure both retrieval pipelines (RAG) and stateful, portable memory layers—working together or independently.

Developers can tune:

  • Memory mode: Auto, Read-only, or Off.

  • Embedding models and vector dimensions: to optimize for precision and latency.

  • Persistence and portability: move your memory between models or agents instantly.

It’s how Backboard achieved record-setting contextual performance on the LoCoMo benchmark (90.1% accuracy) proving that true memory systems outperform retrieval-only setups.

The Takeaway

RAG and AI memory are complementary, not competitive.
Retrieval gives AI external knowledge.
Memory gives it personal history.
Combined, they create agents that are both informed and aware.

All Posts

Announcement

Nov 16, 2025

What is AI Memory, Really?

AI memory is the system that allows models to preserve information over time. It includes parametric memory inside the model weights and non-parametric memory stored outside the model through tools like databases, embeddings, and state layers. Memory is still one of the hardest unsolved problems in AI, and most approaches fail at scale. Backboard treats memory as a configurable infrastructure layer designed for accuracy, persistence, and cross-model continuity.

Why Memory Matters

Models are strong at generating answers but weak at remembering. When they forget context, users repeat themselves, workflows break, and trust drops. Even models with 1M-token windows cannot reliably maintain long-term context. Research from Anthropic and OpenAI shows accuracy decay as conversations grow due to approximate attention and compression limits. Larger windows help, but they do not replace persistent memory.

Parametric vs Non-Parametric Memory

Parametric Memory

• Stored inside model weights
• Learned during training
• Static and difficult to update
• Covers general knowledge but not personal or session-specific information

Non-Parametric Memory

• Stored outside the model
• Dynamic, persistent, controllable
• Includes transcripts, embeddings, session data, threads, and structured state
• Powers RAG, agent architectures, and context managers

Non-parametric systems tend to fail when scaling to millions of tokens, especially when retrieval is inconsistent or state management is improvised.

Why Memory Is Hard

Scale: Storing data is cheap. Finding the correct slice is not.
Retrieval: Semantic search fails if queries do not match embedding characteristics.
State Management: Most agents collapse under long histories due to drift and noisy context injection.
Privacy: Scattered storage across tools creates compliance issues.

How Backboard Solves the Problem

Backboard is built around a principle: memory should behave like a reliable, configurable database for AI.

Stateful Threads

Each conversation or agent runs inside a thread with persistent continuity. Developers get stable long-term context without manual stitching.

Portable Memory

Memory follows the user across 2,200+ models. This eliminates vendor lock-in and enables optimal routing.

Persistent Storage With High Recall

Everything is stored unless configured otherwise. Retrieval accuracy remains high thanks to configurable embedding models, vector DBs, and dimensions. Backboard currently holds the world’s highest validated LoCoMo score for long-context memory.

Configurability

Memory can be tuned per use case.
Examples:
• strict recall vs broader semantic recall
• selective write rules
• custom embedding models and storage
• fine control of context injection

Production Reliability

Backboard includes a unified API, privacy controls, anonymization, and reproducible benchmarks. It removes the need for custom glue code.

How Backboard Compares to Other Approaches

RAG

RAG is good for document lookup, not long-term memory.
• Strength: factual retrieval from known sources
• Weakness: poor with unstructured conversational history, drift, and personal context
Backboard can use RAG components, but it layers stateful threads and persistent memory on top to maintain continuity across tasks.

MemGPT

MemGPT introduced the idea of hierarchical memory with a scratchpad and long-term store.
• Strength: creative architecture for dynamic memory management
• Weakness: heavy prompting logic, custom reasoning loops, difficult to operationalize
Backboard takes the same core idea but delivers it as an API with configurable memory, multiple storage options, and cross-model portability.

Letta

Letta focuses on agent state, tool usage, and planning.
• Strength: strong agent workflows and tool orchestration
• Weakness: less focused on massive-scale, multi-model long-term memory
Backboard complements Letta by supplying a high-accuracy, persistent memory layer that agents can read from and write to.

In short:
• RAG retrieves facts
• MemGPT structures agent memory
• Letta orchestrates agent behavior
• Backboard provides the reliable long-term memory that each of them needs

Why This Matters for the Future

Systems that remember will outperform systems that reset their context every time. Long-term continuity becomes the differentiator for personal assistants, business agents, and enterprise workflows. Memory is not a feature. It is infrastructure.

Next Steps

Explore the LoCoMo benchmark
Review API docs for memory threads
Sign Up!

Changelog

Nov 13, 2025

AWS Bedrock + Enhanced File Upload

We have expanded our Model Library with full support for AWS Bedrock. Developers can now access a broad set of high performance models through a single Backboard API, with unified memory, configuration, and monitoring.

Anthropic Claude Models

  • Claude 4.5 Haiku
    Fast and efficient with a 200K context window.

  • Claude 4.5 Sonnet
    Strong reasoning performance with prompt caching support.

  • Claude 4 Sonnet and Opus
    Premium models with 200K context windows.

  • Claude 3.x Series
    Includes Sonnet, Haiku, and legacy versions for backward compatibility.

Meta Llama Models

  • Llama 4 Maverick
    One million token context window with tool calling enabled.

  • Llama 4 Scout
    128K context and tool support.

  • Llama 3.3 (70B)
    High performing 70B parameter model.

  • Llama 3.2 Series
    Models ranging from 1B to 90B parameters.

  • Llama 3.1 Series
    8B and 70B instruct tuned models.

Other Providers

  • DeepSeek R1
    Advanced reasoning capabilities for research style workloads.

  • Mistral Pixtral Large
    Multimodal model with strong performance across vision and text tasks.

Enhanced File Upload API

We have added stronger guardrails across assistants, threads, and messages to help developers stay within resource limits without guesswork. Error messages now include detailed usage to make troubleshooting straightforward.

Assistant Level

  • File Limit: 20 files

  • Token Limit: 5,000,000 tokens per file

  • File Size: Maximum 200 MB

Thread Level

  • File Limit: 20 files per thread

  • Combined Limit: 140 total files across assistants, threads, and attachments

Message Attachments

  • Per Message: Up to 10 attachments

  • Per Thread: Up to 100 attachments

  • Token Limit: 1,000,000 tokens per attachment

  • File Size: Maximum 10 MB

Message Input Limits

  • Character Limit: 200,000 characters per message

  • Token Limit: 50,000 tokens per message

These improvements create more predictable development workflows while still giving teams a wide range of flexibility. We will continue pushing these limits upward over time as part of our long term effort to deliver near infinite capacity.

Changelog

Nov 10, 2025

Embedding Models Now Available in Model Library

We’re excited to announce that embedding models are now fully integrated into the Backboard Model Library. This update expands the power and flexibility of your AI stack—giving developers direct access to embeddings across multiple providers, right alongside LLMs.

What’s New

You can now:

  • Browse 12 embedding models from OpenAI, Google, and Cohere in the Model Library

  • Filter by provider, dimensions, and model type

  • Use embedding models directly when creating or configuring assistants

New API Endpoints

Developers can programmatically access embedding models using the following endpoints:

GET /api/models/embedding/all           # List all embedding models
GET /api/models/embedding/{model_name}  # Get details for a specific embedding model
GET /api/models/embedding/providers     # List available embedding providers
GET /api/models?model_type=embedding    # Filter models by type

Available Models

OpenAI (3 models)

  • text-embedding-3-large (3072 dims)

  • text-embedding-3-small (1536 dims)

  • text-embedding-ada-002 (1536 dims)

Google (3 models)

  • gemini-embedding-001-768

  • gemini-embedding-001-1536

  • gemini-embedding-001-3072

Cohere (6 models)

  • embed-v4.0 (256, 512, 1024, 1536 dims)

  • embed-english-v3.0

  • embed-multilingual-v3.0

How to Use

When creating an assistant, you can now specify embedding parameters directly:

{
  "name": "My Assistant",
  "embedding_provider": "openai",
  "embedding_model_name": "text-embedding-3-large"
}

The selected model must exist in the Model Library.

Compatibility

All updates are fully backward compatible—existing integrations and assistants continue to work without modification.

Embedding models open up new possibilities for retrieval, classification, search, and RAG workflows inside Backboard.

For support or questions, contact the Backboard team or visit backboard.io/docs.

Announcement

Nov 9, 2025

AI Memory vs RAG: What’s the Difference?

Every few weeks, someone on a dev forum asks, “Isn’t AI memory just RAG?”
It’s an understandable question. Both systems help models “remember” information but they do it in completely different ways.

Understanding the difference isn’t just semantics. It determines how your AI behaves, scales, and how close it gets to real contextual intelligence.

The Short Answer

RAG (Retrieval-Augmented Generation) is about looking up information.
AI memory is about retaining and evolving it.

What RAG Actually Does

RAG connects a model to an external knowledge source like a vector database. When a user asks a question, the system retrieves the most relevant context chunks, passes them to the model, and generates a response.

RAG excels at:

  • Knowledge retrieval: Answering questions from documents, PDFs, or web sources.

  • Reducing hallucinations: Providing evidence-based grounding.

  • Scalability: You can update or swap the data store without retraining the model.

But RAG doesn’t remember anything. Each query is stateless. If you close the chat, the system forgets you ever existed.

What AI Memory Actually Does

AI memory adds a temporal layer—it lets the model accumulate and recall experiences across interactions. Instead of pulling static documents, it reconstructs contextual continuity:

  • User context: remembers preferences, facts, tone, and history.

  • Environmental context: understands where, when, and how interactions happen.

  • Evolving knowledge: stores model-generated insights over time.

Where RAG looks outward for static truth, AI memory looks inward for continuity and learning. It’s how systems begin to behave like never-ending databases of experience, not just retrieval machines.

Why Developers Confuse Them

Many frameworks blur the line by calling vector stores “memory.” But a true contextual memory system does more than retrieve embeddings it writes, organizes, and retrieves context dynamically.

In a proper architecture, RAG and memory work together:

  • RAG augments knowledge with external retrieval.

  • Memory preserves conversational or operational continuity.

You could think of it like this:

RAG finds facts.
Memory builds relationships.

When to Use Each

Use RAG when:

  • You need to access large, changing knowledge bases.

  • You’re answering factual or reference-heavy queries.

  • Your system doesn’t need long-term personalization.

Use AI memory when:

  • You need stateful or evolving context (like ongoing conversations or agents).

  • You’re building assistants that should “learn” over time.

  • You want models that retain user identity, task history, and preferences.

And in most real systems? You’ll want both.

How Backboard Approaches It

At Backboard, we treat memory as an architecture, not a feature.
Our API allows developers to configure both retrieval pipelines (RAG) and stateful, portable memory layers—working together or independently.

Developers can tune:

  • Memory mode: Auto, Read-only, or Off.

  • Embedding models and vector dimensions: to optimize for precision and latency.

  • Persistence and portability: move your memory between models or agents instantly.

It’s how Backboard achieved record-setting contextual performance on the LoCoMo benchmark (90.1% accuracy) proving that true memory systems outperform retrieval-only setups.

The Takeaway

RAG and AI memory are complementary, not competitive.
Retrieval gives AI external knowledge.
Memory gives it personal history.
Combined, they create agents that are both informed and aware.

All Posts

Announcements

Changelog

Announcement

Nov 16, 2025

What is AI Memory, Really?

AI memory is the system that allows models to preserve information over time. It includes parametric memory inside the model weights and non-parametric memory stored outside the model through tools like databases, embeddings, and state layers. Memory is still one of the hardest unsolved problems in AI, and most approaches fail at scale. Backboard treats memory as a configurable infrastructure layer designed for accuracy, persistence, and cross-model continuity.

Why Memory Matters

Models are strong at generating answers but weak at remembering. When they forget context, users repeat themselves, workflows break, and trust drops. Even models with 1M-token windows cannot reliably maintain long-term context. Research from Anthropic and OpenAI shows accuracy decay as conversations grow due to approximate attention and compression limits. Larger windows help, but they do not replace persistent memory.

Parametric vs Non-Parametric Memory

Parametric Memory

• Stored inside model weights
• Learned during training
• Static and difficult to update
• Covers general knowledge but not personal or session-specific information

Non-Parametric Memory

• Stored outside the model
• Dynamic, persistent, controllable
• Includes transcripts, embeddings, session data, threads, and structured state
• Powers RAG, agent architectures, and context managers

Non-parametric systems tend to fail when scaling to millions of tokens, especially when retrieval is inconsistent or state management is improvised.

Why Memory Is Hard

Scale: Storing data is cheap. Finding the correct slice is not.
Retrieval: Semantic search fails if queries do not match embedding characteristics.
State Management: Most agents collapse under long histories due to drift and noisy context injection.
Privacy: Scattered storage across tools creates compliance issues.

How Backboard Solves the Problem

Backboard is built around a principle: memory should behave like a reliable, configurable database for AI.

Stateful Threads

Each conversation or agent runs inside a thread with persistent continuity. Developers get stable long-term context without manual stitching.

Portable Memory

Memory follows the user across 2,200+ models. This eliminates vendor lock-in and enables optimal routing.

Persistent Storage With High Recall

Everything is stored unless configured otherwise. Retrieval accuracy remains high thanks to configurable embedding models, vector DBs, and dimensions. Backboard currently holds the world’s highest validated LoCoMo score for long-context memory.

Configurability

Memory can be tuned per use case.
Examples:
• strict recall vs broader semantic recall
• selective write rules
• custom embedding models and storage
• fine control of context injection

Production Reliability

Backboard includes a unified API, privacy controls, anonymization, and reproducible benchmarks. It removes the need for custom glue code.

How Backboard Compares to Other Approaches

RAG

RAG is good for document lookup, not long-term memory.
• Strength: factual retrieval from known sources
• Weakness: poor with unstructured conversational history, drift, and personal context
Backboard can use RAG components, but it layers stateful threads and persistent memory on top to maintain continuity across tasks.

MemGPT

MemGPT introduced the idea of hierarchical memory with a scratchpad and long-term store.
• Strength: creative architecture for dynamic memory management
• Weakness: heavy prompting logic, custom reasoning loops, difficult to operationalize
Backboard takes the same core idea but delivers it as an API with configurable memory, multiple storage options, and cross-model portability.

Letta

Letta focuses on agent state, tool usage, and planning.
• Strength: strong agent workflows and tool orchestration
• Weakness: less focused on massive-scale, multi-model long-term memory
Backboard complements Letta by supplying a high-accuracy, persistent memory layer that agents can read from and write to.

In short:
• RAG retrieves facts
• MemGPT structures agent memory
• Letta orchestrates agent behavior
• Backboard provides the reliable long-term memory that each of them needs

Why This Matters for the Future

Systems that remember will outperform systems that reset their context every time. Long-term continuity becomes the differentiator for personal assistants, business agents, and enterprise workflows. Memory is not a feature. It is infrastructure.

Next Steps

Explore the LoCoMo benchmark
Review API docs for memory threads
Sign Up!

Changelog

Nov 13, 2025

AWS Bedrock + Enhanced File Upload

We have expanded our Model Library with full support for AWS Bedrock. Developers can now access a broad set of high performance models through a single Backboard API, with unified memory, configuration, and monitoring.

Anthropic Claude Models

  • Claude 4.5 Haiku
    Fast and efficient with a 200K context window.

  • Claude 4.5 Sonnet
    Strong reasoning performance with prompt caching support.

  • Claude 4 Sonnet and Opus
    Premium models with 200K context windows.

  • Claude 3.x Series
    Includes Sonnet, Haiku, and legacy versions for backward compatibility.

Meta Llama Models

  • Llama 4 Maverick
    One million token context window with tool calling enabled.

  • Llama 4 Scout
    128K context and tool support.

  • Llama 3.3 (70B)
    High performing 70B parameter model.

  • Llama 3.2 Series
    Models ranging from 1B to 90B parameters.

  • Llama 3.1 Series
    8B and 70B instruct tuned models.

Other Providers

  • DeepSeek R1
    Advanced reasoning capabilities for research style workloads.

  • Mistral Pixtral Large
    Multimodal model with strong performance across vision and text tasks.

Enhanced File Upload API

We have added stronger guardrails across assistants, threads, and messages to help developers stay within resource limits without guesswork. Error messages now include detailed usage to make troubleshooting straightforward.

Assistant Level

  • File Limit: 20 files

  • Token Limit: 5,000,000 tokens per file

  • File Size: Maximum 200 MB

Thread Level

  • File Limit: 20 files per thread

  • Combined Limit: 140 total files across assistants, threads, and attachments

Message Attachments

  • Per Message: Up to 10 attachments

  • Per Thread: Up to 100 attachments

  • Token Limit: 1,000,000 tokens per attachment

  • File Size: Maximum 10 MB

Message Input Limits

  • Character Limit: 200,000 characters per message

  • Token Limit: 50,000 tokens per message

These improvements create more predictable development workflows while still giving teams a wide range of flexibility. We will continue pushing these limits upward over time as part of our long term effort to deliver near infinite capacity.

Changelog

Nov 10, 2025

Embedding Models Now Available in Model Library

We’re excited to announce that embedding models are now fully integrated into the Backboard Model Library. This update expands the power and flexibility of your AI stack—giving developers direct access to embeddings across multiple providers, right alongside LLMs.

What’s New

You can now:

  • Browse 12 embedding models from OpenAI, Google, and Cohere in the Model Library

  • Filter by provider, dimensions, and model type

  • Use embedding models directly when creating or configuring assistants

New API Endpoints

Developers can programmatically access embedding models using the following endpoints:

GET /api/models/embedding/all           # List all embedding models
GET /api/models/embedding/{model_name}  # Get details for a specific embedding model
GET /api/models/embedding/providers     # List available embedding providers
GET /api/models?model_type=embedding    # Filter models by type

Available Models

OpenAI (3 models)

  • text-embedding-3-large (3072 dims)

  • text-embedding-3-small (1536 dims)

  • text-embedding-ada-002 (1536 dims)

Google (3 models)

  • gemini-embedding-001-768

  • gemini-embedding-001-1536

  • gemini-embedding-001-3072

Cohere (6 models)

  • embed-v4.0 (256, 512, 1024, 1536 dims)

  • embed-english-v3.0

  • embed-multilingual-v3.0

How to Use

When creating an assistant, you can now specify embedding parameters directly:

{
  "name": "My Assistant",
  "embedding_provider": "openai",
  "embedding_model_name": "text-embedding-3-large"
}

The selected model must exist in the Model Library.

Compatibility

All updates are fully backward compatible—existing integrations and assistants continue to work without modification.

Embedding models open up new possibilities for retrieval, classification, search, and RAG workflows inside Backboard.

For support or questions, contact the Backboard team or visit backboard.io/docs.

Announcement

Nov 9, 2025

AI Memory vs RAG: What’s the Difference?

Every few weeks, someone on a dev forum asks, “Isn’t AI memory just RAG?”
It’s an understandable question. Both systems help models “remember” information but they do it in completely different ways.

Understanding the difference isn’t just semantics. It determines how your AI behaves, scales, and how close it gets to real contextual intelligence.

The Short Answer

RAG (Retrieval-Augmented Generation) is about looking up information.
AI memory is about retaining and evolving it.

What RAG Actually Does

RAG connects a model to an external knowledge source like a vector database. When a user asks a question, the system retrieves the most relevant context chunks, passes them to the model, and generates a response.

RAG excels at:

  • Knowledge retrieval: Answering questions from documents, PDFs, or web sources.

  • Reducing hallucinations: Providing evidence-based grounding.

  • Scalability: You can update or swap the data store without retraining the model.

But RAG doesn’t remember anything. Each query is stateless. If you close the chat, the system forgets you ever existed.

What AI Memory Actually Does

AI memory adds a temporal layer—it lets the model accumulate and recall experiences across interactions. Instead of pulling static documents, it reconstructs contextual continuity:

  • User context: remembers preferences, facts, tone, and history.

  • Environmental context: understands where, when, and how interactions happen.

  • Evolving knowledge: stores model-generated insights over time.

Where RAG looks outward for static truth, AI memory looks inward for continuity and learning. It’s how systems begin to behave like never-ending databases of experience, not just retrieval machines.

Why Developers Confuse Them

Many frameworks blur the line by calling vector stores “memory.” But a true contextual memory system does more than retrieve embeddings it writes, organizes, and retrieves context dynamically.

In a proper architecture, RAG and memory work together:

  • RAG augments knowledge with external retrieval.

  • Memory preserves conversational or operational continuity.

You could think of it like this:

RAG finds facts.
Memory builds relationships.

When to Use Each

Use RAG when:

  • You need to access large, changing knowledge bases.

  • You’re answering factual or reference-heavy queries.

  • Your system doesn’t need long-term personalization.

Use AI memory when:

  • You need stateful or evolving context (like ongoing conversations or agents).

  • You’re building assistants that should “learn” over time.

  • You want models that retain user identity, task history, and preferences.

And in most real systems? You’ll want both.

How Backboard Approaches It

At Backboard, we treat memory as an architecture, not a feature.
Our API allows developers to configure both retrieval pipelines (RAG) and stateful, portable memory layers—working together or independently.

Developers can tune:

  • Memory mode: Auto, Read-only, or Off.

  • Embedding models and vector dimensions: to optimize for precision and latency.

  • Persistence and portability: move your memory between models or agents instantly.

It’s how Backboard achieved record-setting contextual performance on the LoCoMo benchmark (90.1% accuracy) proving that true memory systems outperform retrieval-only setups.

The Takeaway

RAG and AI memory are complementary, not competitive.
Retrieval gives AI external knowledge.
Memory gives it personal history.
Combined, they create agents that are both informed and aware.