Skip to content

Google DeepMind Unveils Gemini Robotics 1.5 for Complex, Long-Horizon Tasks

Gemini Robotics 1.5 thinks before it acts, integrating text, images, and video for spatial reasoning and real-time adaptation. It's a significant leap towards more autonomous and adaptable robots.

In this image we can see a robot.
In this image we can see a robot.

Google DeepMind Unveils Gemini Robotics 1.5 for Complex, Long-Horizon Tasks

Google DeepMind has unveiled Gemini Robotics 1.5, a sophisticated robotic system that excels in complex, long-horizon tasks. This new model splits embodied intelligence into two: Gemini Robotics-ER 1.5 for high-level reasoning and Gemini Robotics 1.5 for low-level visuomotor control.

Gemini Robotics 1.5 stands out with its 'think-before-act' approach, converting instructions and percepts into motor commands. It ingests images, video, and text, enabling spatial reasoning and real-time adaptation. The system can explain its decisions and integrate external tools like internet search.

The VLA controller in Gemini Robotics 1.5 improves progress on multi-step tasks. It addresses earlier challenges in planning, success verification, and generalization across different robotic platforms. Motion Transfer allows skills learned on one platform to transfer to another, reducing data collection and narrowing sim-to-real gaps.

DeepMind has expanded evaluation suites to catch hallucinated affordances or nonexistent objects before actuation. Gemini Robotics 1.5 surpasses prior baselines in instruction following, action generalization, and task generalization across three platforms.

Gemini Robotics 1.5 targets long-horizon, real-world tasks. It introduces motion transfer to reuse data across heterogeneous platforms. With its advanced capabilities, this system paves the way for more autonomous and adaptable robots in complex environments.

Read also:

Latest