On May 13 Beijing Time, the local AI ecosystem of Apple Silicon has made significant progress. The oMLX framework released version 0.3.9.dev2, which deeply integrates multiple cutting-edge optimization technologies, significantly improving the speed and usability of local large models for image and text processing, further strengthening the competitiveness of Apple's edge-side AI in practical experience.

Core Technology Upgrade: Full Support for Gemma4 Visual Path
The most notable update in the new version is the direct integration of Gemma4's MTP visual path, DFlash engine, and ParoQuant quantization technology. This combination significantly improves the speed of multi-modal decoding for images and text, greatly reducing the latency threshold for running multi-modal large models locally. The "experience gap" often criticized about local AI has been significantly alleviated after this round of optimization.
Leap in Usability: omlx launch copilot Instantly Access Top Tools
To reduce the difficulty for developers and users, oMLX adds the omlx launch copilot function. Users can quickly access mainstream top AI tools such as Claude, Codex, and OpenClaw with one click, achieving seamless collaboration between local and cloud services. This feature greatly enhances the integration of local AI, making "out-of-the-box" usage a reality.
Optimization of Resource Management: oQ Smart Proxy Solves VRAM Limitations
In response to actual deployment pain points under Apple Silicon's unified memory architecture, the new version introduces the oQ automatic proxy mechanism, which can intelligently handle VRAM shortages, significantly improving the operational stability of large models on consumer devices. Meanwhile, the management interface now includes a server restart button, further optimizing daily maintenance processes.
AIbase Comment: From MLX to oMLX, Apple's edge-side AI is catching up and even surpassing cloud solutions in some aspects at an astonishing speed. The bandwidth advantage brought by the unified memory architecture combined with efficient quantization and engine optimization makes local AI show unique charm in speed, privacy protection, and real-time response. What was once considered "lacking" in local deployment has now achieved a "ridiculous" level of improvement in speed, integration, and usability.
This update clearly sends a signal: AI is truly moving from the cloud to personal devices. In the future, more users may experience the freedom and power of "installing a large model on their own computer."
Project Address: https://github.com/jundot/omlx
