At 4 a.m. on December 4, GitHub and DesignArena simultaneously leaked the new model matrix of OpenAI with the internal code name "Penguin," revealing for the first time four levels of inference budget: flagship Emperor512, mid-range Rockhopper64, lightweight Macaroni16, and zero-inference Mumble0, covering all scenarios from cloud to edge.

Image source note: The image is AI-generated, and the image licensing service provider is Midjourney.
512 Inference Budget! Emperor May Become the Core of GPT-5.2
Internal documents show that Emperor has a 512-unit "juice" inference budget, eight to ten times that of current models, with end-to-end latency controlled within 80ms, achieving a "zero-wait" conversation experience; the code path has been embedded with real-time pruning and dynamic computation allocation, likely serving as the underlying architecture for next year's GPT-5.2.
Four levels of budget = four levels of latency: Macaroni focuses on speed, while Mumble completely skips the inference process.
- Rockhopper (64) (mid-range) positions "inference + speed" balance, aiming to replace GPT-4.5
- Macaroni (16) targets mobile devices, and has been successfully run on the 8Gen3 chip with a 70B model for the first time
- Mumble (0) completely skips the inference step, with response time <50ms, used for high-frequency automation and voice interruption scenarios
Memory Search Exposed Simultaneously: One-click Recall of Conversation Context
The same leaked code shows that ChatGPT will add a "Memory Search" button, allowing users to instantly retrieve historical memories by inputting natural language, without manually searching through chat records; this feature has already been tested internally and is expected to launch along with the Penguin family model.
Shallotpeat & Garlic Follow: OpenAI Sounds the Red Alert
The Information added that OpenAI may release a new inference model with the internal code name Shallotpeat next week, with performance already surpassing Gemini3; another model, Garlic, has completed pre-training and will be launched in early 2026 as GPT-5.2/5.5, focusing on the "small parameters + high-density knowledge" approach, directly competing with Google's "pre-training leap."
Industry Shake-up: Open Source and Closed Source Both Accelerate
- For developers, four levels of budget mean that a single code name can call different combinations of "delay-cost-accuracy," and API pricing is expected to drop by 30%
- For competitors, OpenAI has made the "inference budget" clearly priced for the first time, forcing Google and Anthropic to follow similar classifications
- For regulators, the ultra-fast response of zero-inference Mumble might amplify error rates, and safety assessments may become the last checkpoint before release
OpenAI has not yet announced the specific release date of the Penguin family, but has already started blind testing on DesignArena, and the winning model will be directly integrated into the ChatGPT Plus and Enterprise channels. AIbase will continue to track and bring you benchmark results and API pricing details at the earliest opportunity.
