Small but Strong, Lightweight but Fast! Qwen3.5 Introduces Multiple Small-Sized Models Compatible with Consumer Graphics Cards

Tongyi Lab announced the official release of the latest small-sized version of its Qwen3.5 series, a new generation of large language models. The released models include four parameter scales: 0.8B, 2B, 4B, and 9B, aiming to reduce the application barriers of AI technology through extreme performance optimization, enabling low-cost and efficient deployment from edge devices to vertical scenarios.

According to the information, all models in this series are developed based on the unified Qwen3.5 base. Compared with large models that pursue extreme parameter sizes, these "small-size" members focus on "lightweight" and "high adaptability". The 0.8B and 2B models are specifically designed for edge devices, achieving extreme lightweight and millisecond-level fast response in environments such as smartphones and embedded hardware. The 4B model excels in multimodal capabilities and is considered an ideal choice for building lightweight agents (intelligent entities). Although the 9B model has a compact size, its actual performance is close to larger-scale models and can handle complex logical reasoning tasks.

To further embrace the developer ecosystem, Tongyi Lab announced that the series of models follows the Apache 2.0 license, making them open source and commercially usable. This means developers can freely perform LoRA or full fine-tuning, and only need common consumer-grade GPUs to start task adaptation. This move greatly reduces the time and cost for individual developers and small and medium enterprises to validate ideas and develop vertical applications.

Performance Test of M4 MacBook Pro: 24GB Memory Challenges the Limits of Local AI

The popularity of Apple's M4 chip is driving the development of local AI. Developer jola successfully deployed a local AI workflow on a M4 MacBook Pro with 24GB of memory. Testing shows that the optimized Qwen 3.5-9B model generates up to 40 tokens per second, providing an efficient solution for offline work and private development. In terms of selection, the 9B model is considered the optimal choice for running large language models locally, balancing performance and resource requirements.

New Developments in the Chinese Visual Model Competition: Doubao Takes the Lead, Domestic Strength Fully Surpasses!

SuperCLUE-VLM released the latest Chinese multimodal visual language model evaluation results. ByteDance's Doubao-Seed-2.0-Pro-260215 topped the overall list with a score of 90.66, surpassing Google's Gemini-3.1-Pro-Preview at 89.35. The evaluation covered 17 domestic and international models, with Chinese models performing excellently. Alibaba's Qwen3.5 series and SenseTime ranked among the top, highlighting significant breakthroughs in China's ....

Chinese AI Vision Models Surge Ahead, Doubao Surpasses Google to Rank First Globally

The SuperCLUE-VLM April 2026 evaluation report reveals structural shifts in the Chinese multimodal vision-language model field. In a comprehensive cross-evaluation of 17 global mainstream large models, domestic AI models performed strongly, with significant advantages in Chinese comprehension, surpassing top international models in overall scores. ByteDance's Doubao-Seed-2.0-Pro-260215 topped the leaderboard with 90.66 points, while multiple dome....

Small but Strong, Lightweight but Fast! Qwen3.5 Introduces Multiple Small-Sized Models Compatible with Consumer Graphics Cards

Related Recommendations

Performance Test of M4 MacBook Pro: 24GB Memory Challenges the Limits of Local AI

New Developments in the Chinese Visual Model Competition: Doubao Takes the Lead, Domestic Strength Fully Surpasses!

Chinese AI Vision Models Surge Ahead, Doubao Surpasses Google to Rank First Globally

Target True AI Personal Assistant: Apple Holds Internal Training Camp, Details the Evolution Foundation of Siri

Google AI Research Introduces Vantage: A New Approach to Evaluating Collaboration and Creativity Based on Large Language Models