Audio and Voice Tutorial
RoboCrew allows your robot to listen for voice commands using wake-word detection and respond verbally using Text-to-Speech (TTS). This creates a hands-free “Intelligence Loop” where the robot perceives, reasons, and acts based on your spoken instructions.
1. Enabling Voice in the Agent
Section titled “1. Enabling Voice in the Agent”To utilize audio features, you must provide a microphone device index and enable the TTS flag during the initialization of the LLMAgent or XLeRobotAgent.
Basic Configuration
Section titled “Basic Configuration”agent = XLeRobotAgent( model="google_genai:gemini-3-flash-preview", tools=[...], sounddevice_index=2, # 🎙️ Provide your microphone device index wakeword="Bob", # 🗣️ Custom wake-word (default is "robot") tts=True, # 🔊 Enable Text-to-Speech # ... other params)2. Audio Hardware Setup
Section titled “2. Audio Hardware Setup”Before running the code, ensure your system has the necessary audio libraries installed for handling microphone input:
sudo apt install portaudio19-devpip install pyaudio audioop-lts3. How Listening Works
Section titled “3. How Listening Works”The audio system runs through a SoundReceiver class that manages background recording and transcription:
- Continuous Listening: The robot monitors the environment for a specific volume threshold (RMS).
- Wake-word Detection: It records audio segments and transcribes them. If the defined
wakewordis detected in the transcription, the entire phrase is set as the agent’s new active task. - Task Updates: While the robot is idle or performing a task, it continuously checks the
task_queuefor new verbal instructions.
4. How Speaking Works
Section titled “4. How Speaking Works”When tts=True is set, the agent is granted access to a specialized say tool:
- Communication: The LLM can proactively use the
saytool to greet users, provide status updates (e.g., “I have found the blue notebook”), or ask for clarification. - Echo Prevention: To prevent the robot from hearing and transcribing its own voice, the
SoundReceiverautomatically pauses listening while thesaytool is speaking and resumes once finished.
5. Voice Operation Sequence
Section titled “5. Voice Operation Sequence”- Idle Mode: The robot waits and listens.
- Command: You say, “Hey robot, bring me a beer”.
- Activation: The
SoundReceiveridentifies the wake-word “robot” and updatesagent.taskto “bring me a beer”. - Feedback: The agent may use the
saytool to respond: “Okay, looking for a beer now”. - Execution: The agent enters its main loop to identify and retrieve the object.