Running large models locally used to mean making compromises in performance or functionality. However, with the release of the Qwen3.6 series model, this perception is being broken. Recently, developer Piotr Migdał conducted an in-depth test of Qwen3.627B on a MacBook Max M5128GB device, and the results are exciting: this is not just "usable," but a powerful tool capable of meeting general intelligent needs without sacrificing the user experience.
From a technical perspective, the model demonstrates remarkable efficiency. With the 8-bit GGUF quantized version, combined with the llama.cpp service and optimization technologies such as multi token prediction (MTP) and flash attention, Qwen3.627B can achieve a stable speed of 32 tok/s within a 64K context. Additionally, its 35B A3B MoE version can even exceed 100 tok/s under the same configuration.

The core breakthrough lies in the intelligence level. According to the score from Artificial Analysis, Qwen3.627B achieved 37 points, directly matching the level of GPT-5 or Claude Sonnet4.5 in mid-2025. By comparison, Gemma431B, which was previously the preferred local coding model, only scored 29 points. This means that in just one year, local models have advanced from the "cutting edge" two years ago to nearly the level of top paid API models from a year ago.
In practical scenario tests, the model also performed impressively. Whether it's writing an eight-line poem with complex rhyme requirements or generating a hexagonal minesweeper game using pnpm, Qwen3.627B can complete tasks with high quality in one go. For developers, the biggest advantage of a local model is control — no need to worry about services being revoked or high API call costs, as the model runs entirely on personal hard drives.
This discovery marks an important turning point: when open-source models running on consumer-grade hardware have intelligence levels sufficient to compete with top paid models, developers truly have the confidence to integrate high-performance AI into their personal workflows. For creators pursuing productivity and privacy security, this is undoubtedly one of the most值得关注 technological choices at present.
