RealtimeAgent
WebSocketAudioAdapter
: Stream audio directly from your browser using WebSockets.RealtimeAgent
using TwilioAudioAdapter
. While effective, this approach required a setup-intensive process involving Twilio integration, account configuration, number forwarding, and other complexities. Today, we’re excited to introduce theWebSocketAudioAdapter
, a streamlined approach to real-time audio streaming directly via a web browser.
This post explores the features, benefits, and implementation of the WebSocketAudioAdapter
, showing how it transforms the way we connect with real-time agents.
WebSocketAudioAdapter
TwilioAudioAdapter
provides a robust way to connect to your RealtimeAgent
, but it comes with challenges:
WebSocketAudioAdapter
eliminates these challenges by allowing direct audio streaming over WebSockets. It integrates seamlessly with modern web technologies, enabling real-time voice interactions without external telephony platforms.
WebSocketAudioAdapter
leverages WebSockets to handle real-time audio streaming. This means your browser becomes the communication bridge, sending audio packets to a server where a RealtimeAgent
agent processes them.
Here’s a quick overview of its components and how they fit together:
RealtimeAgent
, allowing the agent to process audio inputs and respond intelligently.TwilioAudioAdapter
, the WebSocketAudioAdapter
requires no phone numbers, no telephony configuration, and no external accounts. It’s a plug-and-play solution.
WebSocketAudioAdapter
to create a voice-enabled weather bot.
You can find the full example here.
To run the demo example, follow these steps:
OAI_CONFIG_LIST
file based on the provided OAI_CONFIG_LIST_sample
:
api_key
to your OpenAI and/or Gemini API keys.
pip
:
RealtimeAgent
! 🎤✨
Jinja2Templates
to load chat.html
from the templates
directory. The template is dynamically rendered with variables like the server’s port
.static
directory./media-stream
WebSocket route is where real-time audio interaction is processed and streamed to the AI assistant. Let’s break it down step-by-step:
/media-stream
. Using await websocket.accept()
, we ensure the connection is live and ready for communication.getLogger("uvicorn.error")
) is set up to monitor and debug the server’s activities, helping track events during the connection and interaction process.WebSocketAudioAdapter
The WebSocketAudioAdapter
bridges the client’s audio stream with the RealtimeAgent
. It streams audio data over WebSockets in real time, ensuring seamless communication between the browser and the agent.RealtimeAgent
is the AI assistant driving the interaction. Key parameters include:
"Weather Bot"
.realtime_llm_config
for LLM settings.WebSocketAudioAdapter
for handling audio.get_weather
function is registered as a realtime callable function. When the user asks about the weather, the agent can call the function to get an accurate weather report and respond based on the provided information:
"The weather is cloudy."
for "Seattle"
."The weather is sunny."
for other locations.await realtime_agent.run()
method starts the agent, handling incoming audio streams, processing user queries, and responding in real time./media-stream
endpoint:
WebSocketAudioAdapter
marks a shift toward simpler, more accessible real-time audio solutions. It empowers developers to build and deploy voice applications faster and more efficiently. Whether you’re creating an AI assistant, a voice-enabled app, or an experimental project, this adapter is your go-to tool for real-time audio streaming.
Try it out and bring your voice-enabled ideas to life!