According to AIbase, early this morning, Zhipu AI officially announced the open-source release of its latest "Hybrid Thinking" model - GLM-4.7-Flash. As the strongest competitor in the 30B specification, this model has successfully topped the performance rankings of models of the same specification, while maintaining the advantage of lightweight deployment, and boasting excellent reasoning and coding capabilities.

Performance Leader: The "All-Rounder Champion" in 30B
GLM-4.7-Flash adopts a 30B-A3B MoE (Mixture of Experts) architecture. This means that the total parameter count is 30 billion, but when processing tasks, only about 3 billion elite parameters are activated. This design finds a perfect balance between resource usage and processing power.
In a series of rigorous benchmark tests, GLM-4.7-Flash has shown remarkable performance, surpassing Alibaba's Qwen3-30B-A3B-Thinking-2507 and OpenAI's GPT-OSS-20B:
Software Engineering (SWE-bench Verified): Achieved 59.2 points, demonstrating top-tier code repair capabilities.
Math and Reasoning: AIME25 achieved 91.6 points, GPQA (Expert-level QA) reached 75.2 points.
Tool Collaboration: τ²-Bench reached 79.5 points, BrowseComp was 42.8 points, showing strong competitiveness in agent scenarios.
Developer-Friendly: Flexible Local Deployment
This model focuses on lightweight and practicality, making it particularly suitable for agent applications in local or private cloud environments. To ensure stable performance, GLM-4.7-Flash has been supported by mainstream inference frameworks:
vLLM and SGLang: Both have already provided support in the main branch. When using vLLM, developers can optimize concurrency and decoding speed through parameters such as
tensor-parallel-sizeandspeculative-config; SGLang also supports the use of the EAGLE algorithm to further improve inference efficiency.Hugging Face: Supports direct invocation through the
transformerslibrary, lowering the barrier for rapid experimentation and integration.
Market Feedback: Performance Upgrade Without Compromising Lightness
The industry community has responded enthusiastically to the release of this version. Netizens generally believe that GLM-4.7-Flash significantly improves the "perceived speed" in actual tasks without increasing hardware burden. A developer commented: "Its performance in coding and tool calls makes local AI assistants truly useful. This balance between performance and efficiency is exactly what we need."
