Google DeepMind's research team has recently launched Gemma Scope2, an open and interpretable tool suite designed to gain insight into the information processing and performance of the Gemma3 language model at various levels, covering models with parameters ranging from 270 million to 27 billion.

The core goal of this tool is to provide a practical method for AI safety and alignment teams to trace model behavior back to internal features, rather than relying solely on input and output analysis. When the Gemma3 model exhibits "escape," hallucinations, or sycophantic behavior, researchers can use Gemma Scope2 to check which internal features are activated and how these activations flow through the network.
Gemma Scope2 is a comprehensive and open collection of sparse autoencoders and related tools, specifically trained on the internal activations of the Gemma3 model series. Sparse autoencoders (SAEs) act like microscopes, decomposing high-dimensional activations into a set of sparse, human-examinable features that correspond to concepts or behaviors. The training of Gemma Scope2 requires storing approximately 110 PB of activation data and adapting over 1 trillion total parameters across all interpretability models.
Compared to the previous Gemma Scope, Gemma Scope2 has been expanded in four main aspects. First, the tool covers the entire Gemma3 series and supports models as large as 27 billion parameters, making it particularly suitable for studying emergent behaviors observed in larger-scale models.
Second, Gemma Scope2 includes sparse autoencoders and decoders trained on each layer of Gemma3, helping track multi-step computations across layers. In addition, the application of the new "Matty Ryoshka" training technique allows sparse autoencoders to learn more useful and stable features, reducing some of the defects found in earlier versions. Finally, the suite provides specialized interpretability tools for chat-based Gemma3 models, enabling the analysis of multi-step behaviors such as escape, refusal mechanisms, and chain-of-thought credibility.
Project Introduction: https://deepmind.google/blog/gemma-scope-2-helping-the-ai-safety-community-deepen-understanding-of-complex-language-model-behavior/
Key Points:
🔍 Gemma Scope2 is an open interpretability tool suite that supports Gemma3 models with parameters ranging from 270 million to 27 billion.
🛠️ The new version of the tool includes sparse autoencoders and decoders to help analyze internal features and behaviors of the model.
🔒 This tool is especially suitable for the field of AI safety, enabling in-depth study of model hallucinations, escapes, and other safety-related behaviors.
