Recently, the InternLM team officially released its open-source lightweight multimodal reasoning model - Intern-S1-mini. The model has only 8B parameters, combining the advanced Qwen3-8B language model with the 0.3B visual encoder InternViT, demonstrating powerful processing capabilities and flexibility.
Intern-S1-mini has undergone large-scale pre-training, using more than 5 trillion token data in total. Notably, over 2.5 trillion tokens came from multiple scientific fields such as chemistry, physics, biology, and materials science. This enables Intern-S1-mini not only to handle conventional text and visual input, but also to interpret complex molecular formulas, protein sequences, and effectively plan synthesis paths, showing its broad application potential in the field of scientific research.

According to the benchmark test results provided by the official, Intern-S1-mini outperforms similar models in multiple task areas. Its performance on tasks such as MMLU-Pro, MMMU, GPQA, and AIME2024/2025 is impressive, with a ChemBench score of 76.47, a MatBench score of 61.55, and a ProteinLMBench score of 58.47. These achievements not only prove the strong capabilities of Intern-S1-mini, but also demonstrate its compatibility with text, image, and video inputs.
Interestingly, Intern-S1-mini defaults to "thinking mode," and users can switch it through a simple command (enable_thinking). This design enhances the model's interactivity and provides users with a more flexible experience.
In today's rapidly developing technology, the release of Intern-S1-mini undoubtedly provides researchers and developers with a new tool, helping them achieve more innovation and breakthroughs in the field of multimodal reasoning. Whether in basic research or practical applications, this model will be a topic worth paying attention to.
