Audio and Voice Tutorial

RoboCrew allows your robot to listen for voice commands using wake-word detection and respond verbally using Text-to-Speech (TTS). This creates a hands-free “Intelligence Loop” where the robot perceives, reasons, and acts based on your spoken instructions.

1. Enabling Voice in the Agent

To utilize audio features, you must provide a microphone device index and enable the TTS flag during the initialization of the LLMAgent or XLeRobotAgent.

Basic Configuration

agent = XLeRobotAgent(
    model="google_genai:gemini-3-flash-preview",
    tools=[...],
    sounddevice_index=2,  # 🎙️ Provide your microphone device index
    wakeword="Bob",       # 🗣️ Custom wake-word (default is "robot")
    tts=True,             # 🔊 Enable Text-to-Speech
    # ... other params
)

2. Audio Hardware Setup

Before running the code, ensure your system has the necessary audio libraries installed for handling microphone input:

sudo apt install portaudio19-dev
pip install pyaudio audioop-lts

3. How Listening Works

The audio system runs through a SoundReceiver class that manages background recording and transcription:

Continuous Listening: The robot monitors the environment for a specific volume threshold (RMS).
Wake-word Detection: It records audio segments and transcribes them. If the defined wakeword is detected in the transcription, the entire phrase is set as the agent’s new active task.
Task Updates: While the robot is idle or performing a task, it continuously checks the task_queue for new verbal instructions.