Google has recently open-sourced its medical AI model MedGemma 1.5. The most core breakthrough lies in breaking the limitations of traditional 2D images, enabling the model to process high-dimensional medical data and achieving significant progress in multiple key medical scenarios.
In terms of capabilities, the upgrade of MedGemma 1.5 is comprehensive. It natively supports the analysis of 3D CT and MRI scans, directly processing three-dimensional images containing dozens of slices; it supports whole-slide digital pathology analysis, searching for lesion clues at a microscopic level; in chest X-ray analysis, it can precisely annotate anatomical structures and lesion locations through bounding boxes, rather than just providing vague conclusions; it also has the ability to perform multi-time-point comparative analysis, tracking whether the condition improves, stabilizes, or worsens. In addition, its understanding of PDF format electronic medical records and lab reports has been significantly improved, allowing accurate extraction of key structured data.

The performance is equally impressive. Compared to the previous version MedGemma 14B, the 1.5 version has achieved an absolute accuracy improvement of 11% in 3D MRI disease classification, a 47% increase in the macro F1 score of whole-slide pathological images, a 35% increase in the intersection-over-union for anatomical localization in chest X-rays, and a 22% surge in the accuracy of electronic medical record Q&A. Notably, these improvements were achieved while keeping the parameter count unchanged at 4 billion, demonstrating high computational efficiency.

In terms of technical implementation, the team injected a large amount of medical image-text paired data, including radiology, dermatology, pathology, and synthetic electronic health records, and designed a preprocessing method that divides 3D CT scans into up to 85 sequential images; in the later stage of training, they also introduced domain expert models for knowledge distillation, directly "transferring" professional experience to the model.
However, it should be clarified that MedGemma 1.5 is not a ready-to-use clinical decision-making tool. Google positions it as a foundational resource for developers to further fine-tune, and actual clinical deployment requires specialized training for specific scenarios. In addition, during the evolution toward becoming a "medical generalist," the model showed a slight decline on some older, less common visual question-answering benchmarks, which is an inevitable cost of comprehensive capability expansion.
Paper link: https://www.alphaxiv.org/abs/2604.05081
