VLA Manipulation Tutorial
RoboCrew allows your robot to perform complex physical tasks—like grabbing objects—by utilizing Vision-Language-Action (VLA) policies as tools. These tools bridge the gap between high-level LLM reasoning and low-level motor control.
1. Prerequisites
Section titled “1. Prerequisites”Before the agent can use an arm, you must have a VLA server running in a separate terminal. RoboCrew uses the LeRobot framework for this:
# Run the VLA server (example for ACT policy)python -m lerobot.async_inference.policy_server --host=0.0.0.0 --port=80802. Creating a VLA Tool
Section titled “2. Creating a VLA Tool”You define a manipulation tool using the create_vla_single_arm_manipulation factory function. This binds a specific pretrained policy to a tool the AI agent can call.
Example Configuration
Section titled “Example Configuration”from robocrew.robots.XLeRobot.tools import create_vla_single_arm_manipulation
pick_up_notebook = create_vla_single_arm_manipulation( tool_name="Grab_a_notebook", tool_description="Use this tool when you are very close to a notebook and looking straight at it.", task_prompt="Grab a notebook.", server_address="0.0.0.0:8080", policy_name="Grigorij/act_right-arm-grab-notebook-2", # Path to pretrained policy policy_type="act", arm_port="/dev/arm_right", servo_controler=servo_controler, camera_config={ "main": {"index_or_path": "/dev/camera_center"}, "right_arm": {"index_or_path": "/dev/camera_right"} }, main_camera_object=main_camera, execution_time=45 # Seconds to run the policy)3. Critical Manipulation Rules
Section titled “3. Critical Manipulation Rules”For successful manipulation, the agent must follow these hardware-specific constraints defined in the system prompt:
- Arm Reach: The robot’s arm reach is very short (~30cm).
- Mode Requirement: Always switch to PRECISION mode before attempting any manipulation to tilt the camera down.
- The Green Line: In PRECISION mode, augmented “green lines” appear in the camera feed. The BASE of the target object must be BELOW this line before the tool is activated.
- Alignment: The target must be centered in the view. If it is off-center, the agent should use
strafeorturntools to align first.
4. Execution Workflow
Section titled “4. Execution Workflow”- Release Camera: When a VLA tool is activated, it temporarily “steals” the camera from the LLM agent so the policy can process the video feed directly.
- Control Loop: The tool connects to the
RobotClient, sends thetask_promptto the server, and executes actions for the specifiedexecution_time. - Restore State: After completion, the tool re-opens the camera for the agent and resets the robot’s head to the normal position.
- Verification: The agent is instructed to always verify the success of the manipulation via the camera feed after the tool finishes.