CHANGELOG

Changelog

New features, improvements, and fixes shipped to Backboard.

CHANGELOG

Changelog

New features, improvements, and fixes shipped to Backboard.

Voice Comes to the Backboard API

API

Backboard has been a clean way to orchestrate LLMs across providers. Until now, that has meant text.

Today we are adding first class voice support to the Backboard API:

  • Speech to Text (STT) with OpenAI and ElevenLabs

  • Text to Speech (TTS) with OpenAI and ElevenLabs

  • Three simple modes:

    1. Audio in → text out

    2. Text in → audio out

    3. Audio in → audio out

You do not wire a new API. You keep using add_message / addMessage, add a voice object, and (for STT) an audio_file / audioFile. The SDKs handle multipart uploads, streaming, and provider quirks for you.

Voice is just another message capability

The core design choice: voice is part of the same assistants and threads model you already use.

You still:

  • Create an assistant

  • Create a thread

  • Call add_message / addMessage

Now you can include:

  • voice.stt to transcribe user audio

  • voice.tts to speak the LLM reply

  • audio_file / audioFile pointing at a local file for STT

Backboard runs the pipeline:

  1. STT with your chosen provider and model

  2. Routes the transcript into your configured LLM

  3. Optional TTS on the reply

  4. Returns text plus a voice_records object with transcripts, audio URLs, durations, and token usage

You can even mix providers, like ElevenLabs STT with OpenAI TTS, in a single request.

Providers, models, and modes

Providers and models

Speech to Text (STT)

  • OpenAI: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize

  • ElevenLabs: scribe_v1, scribe_v2

Text to Speech (TTS)

  • OpenAI: tts-1, tts-1-hd, gpt-4o-mini-tts

  • ElevenLabs: eleven_v3, eleven_multilingual_v2, eleven_flash_v2_5, eleven_turbo_v2_5, and more

You select provider and model per message, per direction.

Three modes in a single API
Mode 1: STT + LLM

Audio in, text out

Send an audio file, get a transcript and the LLM reply.


Python
response = await client.add_message(
    thread_id=thread.thread_id,
    voice={"stt": {"provider": "openai", "model": "whisper-1", "language": "en"}},
    audio_file="user-question.mp3",
    send_to_llm="true",
    stream=False,
)
last = response.messages[-1]
print("Transcript:", last["voice_records"]["stt"]["transcript"])
print("Reply:", response.content)

Use this for call summaries, voice notes to tasks, or “talk instead of type” UX.

Mode 2: LLM + TTS

Text in, audio out

Send text, get the LLM reply plus a presigned audio URL.


Python
response = await client.add_message(
thread_id=thread.thread_id,
content="Say hello in one sentence",
voice={"tts": {"provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy"}},
send_to_llm="true",
stream=False,
)
last = response.messages[-1]
print("Reply:", response.content)
print("Audio URL:", last["voice_records"]["tts"]["audio_url"])

Great for voice enabled agents, read aloud summaries, and accessibility.

Mode 3: STT + LLM + TTS

Audio in, audio out

Full voice conversation in a single call.


Python
response = await client.add_message(
thread_id=thread.thread_id,
voice={
"stt": {"provider": "openai", "model": "gpt-4o-transcribe", "language": "en"},
"tts": {"provider": "openai", "model": "gpt-4o-mini-tts", "voice": "coral"},
},
audio_file="user-question.mp3",
send_to_llm="true",
stream=False,
)
vr = response.messages[-1]["voice_records"]
print("Transcript:", vr["stt"]["transcript"])
print("Reply:", response.content)
print("Audio URL:", vr["tts"]["audio_url"])

This is the fastest path from “text assistant” to “phone like conversation.”

Streaming without rebuilding your stack

You can stream voice pipelines over the same API by setting stream=True.

Backboard emits:

  • STT events: stt_stream_start, stt_text_delta, stt_stream_end

  • LLM events: content_streaming

  • TTS events: tts_stream_start, tts_audio_chunk, tts_stream_end

Example: show STT and LLM deltas, and play TTS audio as it arrives:


Python
import base64, subprocess
player = subprocess.Popen(
["ffplay", "-i", "pipe:0", "-nodisp", "-autoexit", "-loglevel", "quiet"],
stdin=subprocess.PIPE,
)
async for chunk in await client.add_message(
thread_id=thread.thread_id,
voice={
"stt": {"provider": "openai", "model": "gpt-4o-mini-transcribe", "language": "en"},
"tts": {"provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy"},
},
audio_file="user-question.mp3",
send_to_llm="true",
stream=True,
):
t = chunk.get("type")
if t == "stt_text_delta":
print(chunk["delta"], end="", flush=True)
elif t == "content_streaming":
print(chunk.get("content", ""), end="", flush=True)
elif t == "tts_audio_chunk":
player.stdin.write(base64.b64decode(chunk["data"]))
player.stdin.flush()
elif t == "tts_stream_end":
print(f" Audio URL: {chunk.get('audio_url')}")
player.stdin.close()
player.wait()

For the special case of real time microphone audio to ElevenLabs scribe_v2_realtime, we also expose a dedicated Real Time Voice WebSocket API. For everything else, add_message streaming is enough.

Provider options without vendor lock in

You still have access to provider specific features through provider_options, without learning each raw API.

Examples:

  • OpenAI STT: response_format, timestamp_granularities, temperature, prompt

  • ElevenLabs STT: diarize, num_speakers, and event tagging

  • OpenAI TTS: speed, instructions for tone

  • ElevenLabs TTS: voice_settings, language_code, and more


Python
voice={
"stt": {
"provider": "elevenlabs",
"model": "scribe_v2",
"language": "en",
"provider_options": {
"elevenlabs": {
"diarize": True,
"timestamps_granularity": "word",
}
},
},
"tts": {
"provider": "openai",
"model": "gpt-4o-mini-tts",
"voice": "coral",
"output_format": "mp3",
"provider_options": {
"openai": {"instructions": "Calm and professional"}
},
},
}

Backboard normalizes everything into voice_records.stt and voice_records.tts, with provider_output available if you need the raw response.

Audio formats, languages, and limits

STT input formats:

  • OpenAI: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm up to 25 MB

  • ElevenLabs: all major audio and video formats up to 3 GB

TTS output formats:

  • OpenAI: mp3 (default), opus, aac, flac, wav, pcm

  • ElevenLabs: mp3_*, pcm_*, opus_*, wav_*, ulaw_*, alaw_* variants

Languages use ISO 639 1 codes like en, es, fr. Pass language to force one, or omit it to let providers auto detect.

Note: OpenAI requires you to clearly tell users that OpenAI TTS voices are AI generated, not real people. Make sure your UX and terms reflect that.

Why we built it this way

This release follows the same principles as the rest of Backboard:

  • One API, many providers
    You can swap or mix OpenAI and ElevenLabs without rewriting your application.

  • Pipelines are first class
    STT, LLM, TTS, and streaming events are modeled as a single pipeline, not three separate integrations.

  • Cost and token visibility
    voice_records include transcript length, durations, token counts, and audio tokens so you can reason about usage instead of guessing.

  • Same data posture across modalities
    Voice data follows the same rules as text. We do not train on your or your users’ content.

Getting started

If you are already on the Backboard SDKs:

  1. Upgrade to the latest Python or JS/TS SDK.

  2. Add a voice object to your next add_message / addMessage call.

  3. Include audio_file / audioFile for STT.

  4. Turn on stream=True if you want deltas and audio chunks.

If you are not using Backboard yet and want to experiment with voice, reach out for access and sample projects. You can start with a simple “audio in, text out” assistant and grow into full voice conversations once it proves useful.

If you tell me your preferred length (for example, “half as long” or “make this a 1 minute read”) I can cut this down and adjust the intro to match your brand voice.

Introducing Backboard’s Image Tool: Streaming-native Image Generation for Your Agents

API

Today we are launching Image Tool, a built‑in image generation capability for Backboard threads. It lets your agents create and edit images inside normal conversations, with streaming support and first‑class document IDs for follow‑ups.

If you are already using Backboard for text agents, you do not need a new API. You just turn image generation on per message, choose an image model, and let the thread LLM decide when to call the built‑in generate_image tool.

How it works

Image Tool is enabled per message through three new fields:

  • image_generation="auto"

  • image_model_provider

  • image_model_name

You send a normal conversation message. The thread LLM chooses when to call the tool and Backboard handles the image generation and media events for you.

This gives you:

  • Text‑to‑image inside any agent conversation

  • Image‑to‑image edits within the same thread

  • Streaming image URLs and document_ids via media_generated events

  • A simple upgrade path from text‑only agents to multimodal experiences

Python SDK: text‑to‑image in one call

The Python SDK exposes the image fields directly on send_message().


Python
import asyncio
from backboard import BackboardClient
async def main():
client = BackboardClient(api_key="YOUR_API_KEY")
response = await client.send_message(
"Generate a square image of a tiny red robot watering a sunflower in a sunny garden.",
llm_provider="openai",
model_name="gpt-4.1",
image_generation="auto",
image_model_provider="openrouter",
image_model_name="google/gemini-3.1-flash-image-preview",
)
print(response.content)
print(response.thread_id)
print(response.assistant_id)
asyncio.run(main())

You keep your usual text workflow. By adding the image fields, you unlock image generation without managing a separate media pipeline.

Streaming: capture images as they are generated

Most image experiences feel better when they stream. Backboard surfaces images through media_generated events on the same stream you already use for content.


Python
import asyncio
from backboard import BackboardClient
async def main():
client = BackboardClient(api_key="YOUR_API_KEY")
stream = await client.send_message(
"Generate a square image of a tiny red robot watering a sunflower in a sunny garden.",
llm_provider="openai",
model_name="gpt-4.1",
image_generation="auto",
image_model_provider="openrouter",
image_model_name="google/gemini-3.1-flash-image-preview",
stream=True,
)
generated_document_id = None
async for chunk in stream:
event_type = chunk.get("type")
if event_type == "media_generated":
media = chunk.get("media", {})
generated_document_id = media.get("document_id")
print("image url:", media.get("url"))
print("document_id:", generated_document_id)
if event_type == "content_streaming":
print(chunk.get("content", ""), end="", flush=True)
print()
print("final document_id:", generated_document_id)
asyncio.run(main())

Key behavior:

  • content_streaming delivers partial text as usual

  • media_generated gives you the image URL and a stable document_id you can store for later edits or display in your app

Image‑to‑image: edit by document_id

Because images are treated as first‑class documents, you can perform follow‑up edits by referencing the prior document_id inside the same thread.


Python
async def main():
client = BackboardClient(api_key="YOUR_API_KEY")
first = await client.send_message(
"Generate a square image of a tiny red robot watering a sunflower in a sunny garden.",
llm_provider="openai",
model_name="gpt-4.1",
image_generation="auto",
image_model_provider="openrouter",
image_model_name="google/gemini-3.1-flash-image-preview",
stream=True,
)
document_id = None
thread_id = None
async for chunk in first:
if chunk.get("thread_id"):
thread_id = chunk.get("thread_id")
if chunk.get("type") == "media_generated":
document_id = (chunk.get("media") or {}).get("document_id")
if not thread_id or not document_id:
raise RuntimeError("Missing thread_id or document_id from first image generation")
second = await client.send_message(
(
f"Edit the generated image with document_id {document_id}. "
"Pass this exact id as input_image_document_id to the generate_image tool. "
"Make the robot blue and add a small crescent moon in the sky."
),
thread_id=thread_id,
llm_provider="openai",
model_name="gpt-4.1",
image_generation="auto",
image_model_provider="openrouter",
image_model_name="google/gemini-3.1-flash-image-preview",
stream=True,
)
async for chunk in second:
if chunk.get("type") == "media_generated":
media = chunk.get("media", {})
print("edited image url:", media.get("url"))
print("edited document_id:", media.get("document_id"))
asyncio.run(main())

This pattern lets you build iterative design flows. Your users can ask an agent to refine an existing image, and your app only needs to keep track of thread_id and document_id.

JavaScript and TypeScript

The current JavaScript SDK helper does not yet expose image fields, so you call the HTTP API directly with the standard /threads/messages request.


Typescript
const response = await fetch("https://app.backboard.io/api/threads/messages", {
method: "POST",
headers: {
"X-API-Key": "YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
content: "Generate a square image of a tiny red robot watering a sunflower in a sunny garden.",
llm_provider: "openai",
model_name: "gpt-4.1",
image_generation: "auto",
image_model_provider: "openrouter",
image_model_name: "google/gemini-3.1-flash-image-preview",
stream: false,
}),
});
const result = await response.json();
console.log(result.content);


This uses the same fields and behavior as the Python SDK. When streaming is enabled, you receive media_generated events in the response stream.

Parameters at a glance

Field

Description

image_generation

Set to "auto" to enable the built‑in image tool

image_model_provider

Provider for the image model, such as openrouter or google

image_model_name

Image‑capable model name

thread_id

Reuse the same thread for follow‑up edits

stream

Enables media_generated events for streaming images

Recommended setup

To get the best experience:

  1. Use a normal thread LLM such as gpt-4.1 for conversation

  2. Configure an image model with image_model_provider and image_model_name

  3. Turn on streaming so you can react to media_generated events

  4. Store document_id values for any images you want to edit or redisplay later

What you can build with Image Tool

Some examples that work out of the box:

  • UI and marketing agents that generate hero images and iterate on them with user feedback

  • Product customization flows that let users start from a base template and refine by prompt

  • Design review bots that annotate or modify existing images based on natural language instructions

Any place you already rely on Backboard threads, you can now add images without standing up a separate media service.

Build AI Faster with Backboard’s New Unified API

API

What's New

Backboard’s API is now simpler, faster, and easier to integrate — giving developers a one-call flow for building stateful AI systems without stitching together context, memory, tools, and configuration manually.

One-Call Conversations
  • Start a conversation with a single request.

  • Automatically handle context and memory behind the scenes.

  • Apply system instructions, tools, model selection, and memory behavior per interaction.

  • Continue seamlessly by referencing the same thread.

Streamlined API Surface
  • A unified interaction flow now handles input, response, context, memory, and system behavior in one lifecycle.

  • Simplified tool handling reduces the amount of state developers need to track manually.

  • Run controls make it easier to manage in-progress requests, including stopping execution when needed.

Memory Management
  • Reset or adjust memory state without rebuilding your system.

  • Configure memory behavior per request without affecting global settings.

  • Keep persistent configurations available through dedicated endpoints when needed.

Per-Interaction Configuration

Each request can define its own:

  • system instructions

  • tool usage

  • model selection

  • memory behavior

This makes every interaction behave exactly as specified, without unintended side effects across threads or assistants.

Updated SDKs + Docs
  • Python and JavaScript/TypeScript SDKs now support the simplified interaction model.

  • Docs have been rewritten with clearer examples, better organization, and updated guides for streaming and tool usage

Backward Compatibility

All existing endpoints remain supported.

  • No forced migrations.

  • No migration deadlines.

  • Teams can adopt the new API at their own pace.

How to Use It

Send a single request to start a stateful interaction, then continue the conversation by referencing the same thread.

Explore how Backboard simplifies AI infrastructure 👉 https://backboard.io/use-case