Global's First Pure AMD-Trained MoE Large Model ZAYA1 Launch: 14T Tokens + CCA Attention Performance Comparable to Qwen3

AMD, in collaboration with IBM and AI startup Zyphra, has launched ZAYA1 - the world's first MoE foundational model fully trained on AMD hardware. It is pre-trained on 14T tokens, with comprehensive performance comparable to the Qwen3 series. Its mathematical/STEM reasoning capabilities without instruction fine-tuning can approach those of Qwen3 Professional.

Training Scale

- Cluster: 128 IBM Cloud nodes × 8 AMD Instinct MI300X cards, totaling 1024 cards; InfinityFabric + ROCm, peak performance of 750 PFLOPs

- Data: 14T tokens, curriculum learning from general web pages → mathematics/code/reasoning; later training versions will be released separately

Architecture Innovations

1. CCA Attention: Convolution + compressed embedding attention heads, reducing memory usage by 32% and increasing long context throughput by 18%

2. Linear Routing MoE: refined expert granularity + load balancing regularization, improving Top-2 routing accuracy by 2.3pp, maintaining high utilization even when sparsity reaches 70%

Benchmark Results

ZAYA1-Base (non-instructed version) matches Qwen3-Base on benchmarks such as MMLU-Redux, GSM-8K, MATH, and ScienceQA. It significantly outperforms on CMATH and OCW-Math, demonstrating its STEM potential. Zyphra revealed that instructed and RLHF versions will be launched in Q1 2026, with API and weight downloads available.

AMD stated that this collaboration validates the feasibility of MI300X + ROCm in large-scale MoE training. In the future, it will replicate the "pure AMD" cluster solution with more cloud providers, aiming to achieve cost parity with NVIDIA solutions when training MoE models exceeding 100B parameters by 2026.

Global's First Pure AMD-Trained MoE Large Model ZAYA1 Launch: 14T Tokens + CCA Attention Performance Comparable to Qwen3

Related Recommendations

AMD Acquires MK1 to Accelerate Its Position in the AI Inference Market

AMD CEO Reveals: Multiple OpenAI-Level Clients Competing to Purchase AI Chips

IBM to Lay Off Thousands of Employees, Focus on Artificial Intelligence and Software Business

IBM Releases Granite 4.0 Nano Model, Breaking the Performance Limits of Small AI Models

U.S. Department of Energy and AMD Reach a Billion-Dollar Partnership to Build Supercomputers and AI Projects