Skip to content

VLA Manipulation Tools

RoboCrew allows your robot to perform complex physical tasks—like grabbing objects—by utilizing Vision-Language-Action (VLA) policies as tools. These tools bridge the gap between high-level LLM reasoning and low-level motor control.

Before the agent can use an arm, you must have a VLA server running in a separate terminal. RoboCrew uses the LeRobot framework for this:

# Run the VLA server (example for ACT policy)
python -m lerobot.async_inference.policy_server --host=0.0.0.0 --port=8080

This method hosts the VLA policy on the robot’s onboard computer, providing the lowest possible latency for real-time motor control. It is the most robust option but realisticly Raspberry Pi 5 is too weak for it.

While settng up VLA tool set server_address="0.0.0.0:8080"

By running the policy on a powerful workstation within the same local network, you can leverage high-end GPUs while keeping the robot lightweight.

While settng up VLA tool set server_address="your-server-ip"

C. External server (Tailscale recomendation)

Section titled “C. External server (Tailscale recomendation)”

Using a VPN like Tailscale allows you to connect to a remote cloud server or a distant workstation securely over the internet.

While settng up VLA tool set server_address="your-tailscale-server-ip"

You define a manipulation tool using the create_vla_single_arm_manipulation factory function. This binds a specific pretrained policy to a tool the AI agent can call.

Remember to ensure server_address has correct value.

Section titled “Remember to ensure server_address has correct value.”
from robocrew.robots.XLeRobot.tools import create_vla_single_arm_manipulation

pick_up_notebook = create_vla_single_arm_manipulation(
    tool_name="Grab_a_notebook",
    tool_description="Use this tool when you are very close to a notebook and looking straight at it.",
    task_prompt="Grab a notebook.",
    server_address="0.0.0.0:8080",
    policy_name="Grigorij/act_right-arm-grab-notebook-2", # Path to pretrained policy
    policy_type="act",
    arm_port="/dev/arm_right",
    servo_controler=servo_controler,
    camera_config={
        "main": {"index_or_path": "/dev/camera_center"},
        "right_arm": {"index_or_path": "/dev/camera_right"}
    },
    main_camera_object=main_camera,
    execution_time=45  # Seconds to run the policy
)