
Announcement
Feb 12, 2026
Backboard.io Becomes First AI Platform to Lead Both Major Memory Benchmarks
Backboard.io today announced state-of-the-art results across the two leading AI memory benchmarks, LoCoMo and LongMemEval, reinforcing its position as the foundational AI stack for production-grade and agentic systems.
An independent evaluation conducted by NewMathData, a Texas-based engineering firm and AWS Small Partner of the Year, measured Backboard’s performance on the LongMemEval benchmark using the benchmark’s original academic specification. Backboard achieved 93.4% overall accuracy, the highest publicly reported result under consistent methodology and a material margin ahead of other reported systems.
During post-evaluation review, Backboard and the independent evaluator identified multiple instances where Backboard’s responses were marked incorrect despite being more precise and semantically accurate than the benchmark’s expected answer. In these cases, Backboard answered the question as written, incorporating factual context already present in the interaction, while the benchmark’s “gold” answer reflected a narrower or alternate interpretation of the prompt. As a result, the reported LongMemEval score should be considered a conservative lower bound on performance rather than an upper limit.
These results build on Backboard’s previously published 90.1% accuracy on the LoCoMo benchmark, with results publicly available and reproducible via GitHub. Achieving state-of-the-art performance on both benchmarks is uncommon, as most systems optimize for either short-horizon precision or long-horizon persistence, but not both.
Importantly, Backboard did not set out to optimize for benchmarks. The LongMemEval evaluation was initiated and run independently, and the LoCoMo benchmark was explored simply to understand where Backboard fit relative to academic research. The results reflect system-level behavior, not benchmark-specific tuning.
“We didn’t build Backboard to chase benchmarks,” said Rob Imbeault, founder of Backboard.io. “We built it to solve real problems that show up when AI systems run for a long time, across multiple agents, under real constraints. The benchmarks just happened to confirm what we were already seeing in practice.”
Independent Validation of What “Memory” Really Means
In a recent analysis published by the Ottawa Business Journal, Adyasha Maharana, creator of the LoCoMo benchmark and research scientist at Databricks, clarified an important distinction often lost in AI evaluations.
“The dataset is designed to examine not just an LLM but any LLM-based system’s capabilities and blindspots in a fine-grained manner,” Maharana explained. “Raw human performance is somewhere around 88 percent. Breaking the 90-percent threshold requires superhuman consistency in recall and reasoning.”
She further noted that most high-performing frontier models currently score around 80 percent on LoCoMo when evaluated by feeding the full conversation as a single prompt.
“Strictly speaking, this is not memory,” she said. “This is simply understanding whether the LLM pays attention to each part of its input and is able to reason over it correctly. The system built by Backboard.io is a far better attempt at simulating memory as it manifests in humans. It is practical, cheaper, scalable and doesn’t rely solely on brute-force LLM processing for answers.”
This distinction underscores why Backboard’s results reflect more than model capacity. They demonstrate a system-level approach to memory that persists, evolves, and remains reliable over time.
A Complete AI Stack, Not a Bolt-On Component
Backboard.io is not a router, a wrapper, or a memory plugin. It is a unified AI infrastructure stack designed to serve as the starting point for modern AI systems.
From a single API, Backboard provides:
Persistent long-term memory
Native embeddings and vectorization
Retrieval-augmented generation (RAG)
Shared memory across agents
Access to more than 17,000 large language models, including a bring-your-own API key option.
By integrating memory, embeddings, retrieval, and model access into one system, Backboard eliminates the need for enterprises to stitch together fragile chains of open-source components. Memory is treated as first-class infrastructure, not application logic.
This architecture allows systems to evolve without breaking:
Models can be swapped without losing continuity
Agents can coordinate while sharing state
Retrieval strategies can change without rewrites
Systems remain coherent as complexity grows
Making Agentic AI Practical
As interest in agentic AI accelerates, many systems fail to move beyond isolated demos because memory is treated as an afterthought. Without reliable, shared memory, agents fragment, hallucinate, and reset.
Backboard addresses this constraint directly by enabling persistent, shared memory across countless agents, even when those agents operate on different underlying models. When memory is solved, agentic behavior emerges naturally rather than being scripted.
“Agentic AI doesn’t become meaningful because you call something an agent,” said Imbeault. “It becomes meaningful when agents can remember, coordinate, and operate over time. Solving memory is the prerequisite.”
Backboard.io’s architecture is built around Active Temporal Resonance, a memory framework designed to preserve meaning and continuity as interactions unfold. By maintaining temporal coherence rather than reconstructing state through static graphs or repeated retrieval, Backboard enables systems that remain consistent, auditable, and trustworthy at scale.
Built by a Founder Enterprises Already Trust
Imbeault previously founded Assent, a platform trusted by Fortune 100 companies to manage complex supply-chain and regulatory-compliance workflows. That experience informed Backboard’s focus on durability, correctness, and trust from day one.
“Enterprise systems don’t get to reset,” said Imbeault. “If they lose context or trust, they fail. That mindset shaped how we built Backboard.”
What Comes Next
With foundational memory validated across independent and academic benchmarks, Backboard is turning its attention to how teams evaluate and reason about complex AI systems in practice.
The company will soon introduce Switchboard, a new capability designed to help developers and enterprises better understand how different AI system configurations behave under real-world constraints. Additional details will be shared in the coming weeks.
“The future of AI isn’t about clever tricks or bolt-ons,” said Imbeault. “It’s about building systems that can be trusted over time. Memory is the foundation, and that’s where enterprises should start.”
Additional details on Backboard’s benchmark results are available on the company’s website and GitHub repository. The LongMemEval evaluation report and supporting materials will be released publicly.
Changelog

