Apple's Local AI Makes a Strong Comeback! oMLX 0.3.9 Major Update: Gemma 4 Vision Acceleration + One-Click Copilot, Cloud-Based Large Model Advantages Fully Balanced

On May 13 Beijing Time, the local AI ecosystem of Apple Silicon has made significant progress. The oMLX framework released version 0.3.9.dev2, which deeply integrates multiple cutting-edge optimization technologies, significantly improving the speed and usability of local large models for image and text processing, further strengthening the competitiveness of Apple's edge-side AI in practical experience.

Core Technology Upgrade: Full Support for Gemma4 Visual Path

The most notable update in the new version is the direct integration of Gemma4's MTP visual path, DFlash engine, and ParoQuant quantization technology. This combination significantly improves the speed of multi-modal decoding for images and text, greatly reducing the latency threshold for running multi-modal large models locally. The "experience gap" often criticized about local AI has been significantly alleviated after this round of optimization.

Leap in Usability: omlx launch copilot Instantly Access Top Tools

To reduce the difficulty for developers and users, oMLX adds the omlx launch copilot function. Users can quickly access mainstream top AI tools such as Claude, Codex, and OpenClaw with one click, achieving seamless collaboration between local and cloud services. This feature greatly enhances the integration of local AI, making "out-of-the-box" usage a reality.

Optimization of Resource Management: oQ Smart Proxy Solves VRAM Limitations

In response to actual deployment pain points under Apple Silicon's unified memory architecture, the new version introduces the oQ automatic proxy mechanism, which can intelligently handle VRAM shortages, significantly improving the operational stability of large models on consumer devices. Meanwhile, the management interface now includes a server restart button, further optimizing daily maintenance processes.

AIbase Comment: From MLX to oMLX, Apple's edge-side AI is catching up and even surpassing cloud solutions in some aspects at an astonishing speed. The bandwidth advantage brought by the unified memory architecture combined with efficient quantization and engine optimization makes local AI show unique charm in speed, privacy protection, and real-time response. What was once considered "lacking" in local deployment has now achieved a "ridiculous" level of improvement in speed, integration, and usability.

This update clearly sends a signal: AI is truly moving from the cloud to personal devices. In the future, more users may experience the freedom and power of "installing a large model on their own computer."

Project Address: https://github.com/jundot/omlx

Google Gemma4 Speeds Up by 3 Times, the Era of Offline Large Models Has Truly Arrived

Google recently launched a Multi-Token Prediction (MTP) drafter for its open-source model Gemma4, leveraging speculative decoding architecture to boost inference speed by up to 3x while maintaining output quality and logical capabilities. Since its release, the model has seen rapid download growth, becoming one of the most popular open-source models globally.....

Bestseller Reservation: Say Goodbye to Token Anxiety! Draw Hand-Drawn Flowcharts with Gemma 4 Locally in the Browser, All Free of Charge

Running large models on mobile devices has become common, while built-in powerful AI capabilities in browsers have become a new trend. Developers use Google's TurboQuant algorithm to successfully deploy the Gemma4 model in the browser, allowing users to achieve smooth AI interaction locally without API configuration or subscription fees. The core is the memory revolution brought by the TurboQuant algorithm.

Google quietly releases Google AI Edge Eloquent: a free offline AI dictation tool based on Gemma4

Google has launched the experimental voice input app 'Google AI Edge Eloquent' on the iOS platform, focusing on offline-first and intelligent polishing features. It uses edge AI technology to convert spoken language into professional text in real time. This move marks Google's entry into the high-end AI speech-to-text market, competing with Wispr Flow and SuperWhisper. The app is powered by the Gemma4 series technology, emphasizing real-time processing and text optimization capabilities.