In the world of technology, Google DeepMind has dropped another bombshell—they've installed a system called Gemini 1.5 Pro on a robot. This isn't just an upgrade; it gives the robot the superpower of memory navigation, essentially giving it a "sky eye" for robots.

Imagine a robot operating in an area of nearly 9,000 square feet, performing 57 different tasks with a success rate of 90%. These aren't simple tasks, like finding a place to draw; the robot not only understood but also led you to a large whiteboard. This operation is even more reliable than a human.

The amazing part of this system is that it can handle multimodal long context windows, meaning the robot can not only remember key locations but also understand human instructions, video guidance, and even reason with common sense. Like the example of the Google employee, the robot not only understood "the place to draw" but also knew to look for a place with a large whiteboard.

Moreover, these robots have already familiarized themselves with office environments in previous projects, understanding spatial layouts through "multimodal instruction navigation demonstrations." The DeepMind team has also used a hierarchical visual-language-action (VLA) technology that allows robots to understand written, drawing commands, and gesture instructions.

The core of this system is that it allows robots to move freely in complex spaces without constant human guidance. They can remember the environment, understand instructions, and complete tasks in their own way. This capability makes robots more flexible and useful in practical applications.

In summary, Google DeepMind's technology not only makes robots smarter but also enables them to better serve humans in the real world. It's as if opening a new door for robots, allowing them to enter our lives and become our partners in work and exploration of the world. Perhaps in the future, robots will no longer be cold machines but intelligent companions in our daily lives.