Article

Successfully Running 400 Billion Parameters Model! iPhone 17 Pro Challenges Local Execution of Large Models, but the Speed is Only 0.6 Token

Published in Latest AI News

Time :Mar 24, 2026

Read :3minute

Where is the computing limit of smartphones?Apple's latest flagshipiPhone17Pro has just given an answer that is both impressive and somewhat embarrassing.

On March 23, a large language model with 400 billion parameters successfully ran oniPhone17Pro. It should be noted that even after quantization compression, such models usually require at least 200GB of memory to run, whileiPhone17Pro has a hardware configuration of only 12GB LPDDR5X memory.

Technical "Black Technology": Flash Memory Streaming and Mixture of Experts (MoE) Model

Under the severe lack of memory capacity, this "impossible task" was mainly achieved through two technical approaches:

SSD Forced "Expansion": Using the open-source project Flash-MoE, the device directly streams data from the solid-state drive (SSD) to the GPU, breaking through the physical memory limit.

Advantages of MoE Architecture: "MoE" stands for Mixture of Experts, which means the system only needs to call a small part of the 400 billion parameters when generating each word, rather than loading the entire model.

Speed Drawback: A Word Every Two Seconds

Although it "ran successfully," the actual experience is still far from being "usable." Test results show:

Generation Speed: Only 0.6 Token/second. In other words, it takes about 1.5 to 2 seconds to generate one word.

Power Consumption Pressure: This high-intensity local computation will rapidly drain the phone's battery life, and the heat generated is also not negligible.

Industry Insight: The "Singularity" of Local Large Models Is Approaching?

Although the current generation speed is frustrating, the symbolic significance of this demonstration exceeds its practical value. It proves that running top-scale large models locally on a smartphone is not a dead end.

Privacy Protection: Local operation means data does not need to be uploaded to the cloud, providing extremely high privacy protection.

Offline Feasibility: It is becoming possible to get responses from top AI even without an internet connection.

Related Recommendations

AI Daily: GPT5.6 Series Models Released, Codex Disappears; Tencent Plans to Take Over Manus as the Largest Shareholder; MiniMax Founder Announces Zero Salary Until Achieving AGI

AI Daily covers AI trends and product innovations. This issue: OpenAI updates its Chrome extension, allowing ChatGPT to live in the sidebar, read pages, control tabs, access local files, and summarize PDFs—no app switching needed. Limited to Plus and Pro users.....

Jul 10, 2026

201.4k

Outperforming Competitors! Google Gemini 3.5 Pro Revealed: Epic Upgrade Scheduled for July 17

Google will launch Gemini 3.5 Pro on July 17, featuring a 2-million-token context window and "Deep Think" reasoning mode. Aimed at advanced systems, it excels in complex agent workflows, action execution, sub-agent collaboration, coding, and long-duration tasks, outperforming the earlier Flash version.....

Jul 7, 2026

241.6k

Survey Shows More Than Half of Companies Regret Laying Off Employees Due to AI; Ford, IBM and Other Giants Rehire Human Workers

After aggressive AI-driven layoffs, the industry is returning to human-machine collaboration. A survey shows that among 39% of business leaders who laid off staff due to AI, 55% admit it was a mistake. Research indicates that focusing solely on tech replacement while neglecting employee training will lead to team failure due to lack of critical oversight.....

Jul 1, 2026

263.2k

Former DeepMind Team Quantitative AI Company EquiLibre Completes Series A Funding with a Valuation of $500 Million

EquiLibre Technologies, founded by three ex-DeepMind researchers, applies reinforcement learning to financial trading. It raised a Series A at a $500M valuation, led by Creandum in its largest-ever investment. The firm partners with Tower Research Capital, trades billions daily, and will enter crypto markets in 2025.....

Jul 1, 2026

228.5k

Cook's Icebreaking Meeting: Apple and the EU Engage in Constructive Consultations on New Version of Siri AI in Europe

Cook held constructive talks with the EU's regulatory chief over the delayed Siri, focusing on integrating AI features while complying with DMA interoperability rules, aiming to resolve market access disputes in the EU.....

Jul 1, 2026

180.9k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご