Article Content

Research Finds Google AI Model Veo-3 Can Generate Realistic Surgical Videos but Lacks Medical Logic Understanding

Published in Latest AI News

Time :Nov 10, 2025

Read :4minute

Recently, researchers tested Google's latest video generation AI model, Veo-3, and the results showed that although the model can generate very realistic surgical videos, it has significant shortcomings in understanding medical procedures. In the study, the research team provided a surgical image and asked Veo-3 to predict what would happen in the next 8 seconds of the surgery. To do this, they created an evaluation standard called SurgVeo, which includes 50 real laparoscopic and neurosurgical videos.

The research team invited four experienced surgeons to independently rate the AI-generated videos, with evaluation criteria covering four aspects: visual realism, reasonableness of instrument use, tissue response, and surgical logic. Although the surgeons gave high ratings to the quality of the videos generated by Veo-3, calling them "stunningly clear," in-depth analysis showed that the AI's performance in medical logic was significantly compromised. In the laparoscopic surgery test, Veo-3 scored 3.72 for visual plausibility, but only 1.78 for instrument operation, 1.64 for tissue response, and as low as 1.61 for surgical logic.

Especially in neurosurgical scenarios, Veo-3 performed even worse, with a score of only 1.13 for surgical logic after 8 seconds. The research team found that over 93% of the errors stemmed from medical logic issues, such as inventing non-existent surgical instruments and tissue responses that violated physiological laws. Attempts to provide the model with more contextual information, such as the type of surgery and specific procedural stages, did not significantly improve its performance.

This study shows that current video generation AI is still far from truly understanding medical procedures. Although these systems may be used for doctor training and preoperative planning in the future, existing models have not yet reached a safe and reliable level of application. The research team plans to open-source the SurgVeo dataset to promote academic progress in AI's medical understanding. At the same time, it also reminds us that using such generated videos in medical training poses serious risks, potentially leading to misleading learning and incorrect surgical techniques.

Key Points:
🌟 The Veo-3 model can generate realistic surgical videos, but lacks understanding of medical logic.
🔍 Over 93% of the errors stem from medical logic problems, seriously affecting the accuracy of the videos.
📈 The research team plans to open-source the dataset to promote progress in AI's medical understanding.

Related Recommendations

OpenAI Launches GPT-5-Codex-Mini: A Lightweight, Fast, and Cost-Effective Model for Developers

OpenAI launches the GPT-5-Codex-Mini programming model, specifically designed for developers with high cost-effectiveness. This model is based on the GPT-5 architecture, enhancing code reasoning and generation capabilities. It supports complex tasks such as creating new projects, expanding features, writing tests, and large-scale code refactoring. On the SWE-bench benchmark test, it scores 74.5%, surpassing the previous version GPT-5High's 72%, further expanding the boundaries of intelligent programming applications.

Nov 10, 2025

148.0k

LMArena Latest Ranking: Wenxin Big Model 5.0 Leads in Textual Capabilities

The Wenxin ERNIE-5.0-Preview-1022 model has become the national champion in textual capabilities in the latest LMArena large model competition, and is tied for second place globally. The model performs outstandingly in creative writing and complex problem understanding, marking a new breakthrough in China's large model technology and demonstrating the strong potential of artificial intelligence development.

Nov 10, 2025

81.5k

Unveiling the Mystery of MiniMax M2: Why Choose Full Attention Mechanism?

The MiniMax M2 model uses a full attention mechanism, abandoning linear or sparse attention techniques. The development team believes that although the latter can save computing resources, full attention is more efficient in industrial applications and can improve model performance. This decision aims to optimize actual deployment results and promote the development of AI technology.

Nov 6, 2025

137.7k

Meituan's All-Round Cat Makes a Grand Debut! LongCat-Flash-Omni Multimodal Large Model Opens Source and Tops the Charts Immediately, with Real-Time Interaction That Is Extraordinarily Fast

Meituan's open-source multimodal large model, LongCat-Flash-Omni, achieves a technological breakthrough, surpassing closed-source competitors in multiple benchmark tests, reaching industry-leading levels. The model supports real-time integration processing of text, speech, images, and video, with near-zero latency in interaction, pushing locally developed multimodal AI applications to a new level.

Nov 5, 2025

152.6k

Shanghai Bank Launches Its First Hu-Shang Language Interactive AI Application to Support Smart Elderly Financial Services

Enterprises such as Shanghai Caiyue Star and Shanghai Bank signed a strategic cooperation agreement, launching the country's first complete Hu-Shang language interactive AI application, supporting elderly financial services and dialect intelligent system construction, providing more convenient financial services for elderly people who are accustomed to dialects.

Nov 5, 2025

119.0k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご