At the recent Axios AI + Summit, Demis Hassabis, CEO of Google DeepMind, shared his vision for the AI field in the coming year. He pointed out that 2026 will be a crucial year for rapid developments in multimodal models, interactive video worlds, and more reliable AI agents.
Hassabis emphasized that DeepMind's latest AI model "Gemini" has made significant progress in multimodal capabilities. He mentioned that the model can not only describe plots but also deeply understand the underlying meaning of scenes. For example, in the movie "Fight Club," the AI interpreted a character removing a ring as a philosophical abandonment of daily life. This deep understanding enables AI to generate more complex outputs, such as infographics, which previous technologies could not achieve.
He also mentioned that AI agents will be able to "approach" the ability to autonomously handle complex tasks within a year. This development aligns with the timeline he outlined in May 2024. DeepMind's goal is to create a general-purpose assistant that works across devices to help users manage daily life. To achieve this, DeepMind is also developing a "world model" called "Genie 3," which can generate interactive and exploratory video spaces, allowing users to immerse themselves in virtual worlds.
Key Points:
🌟 Advances in multimodal models will drive AI's ability to understand and generate complex content.
🛠️ AI agents will soon approach the level of autonomously handling complex tasks.
🌍 DeepMind is developing interactive video spaces, offering users a new immersive experience.
