Zigong and Tsinghua University Propose ZCube Networking Architecture: 15% Increase in Large Model Inference Throughput, One-Third Reduction in Network Costs

Large model inference is redefining AI infrastructure, and network architecture innovation has become the key path to unleashing hardware potential. In September 2025, Zhipu, Yuchen Network, and Tsinghua University published research on the ZCube network architecture at the top conference in the networking field, ACM SIGCOMM 2025.

On May 21, 2026, Zhipu announced that the architecture had been successfully implemented in the GLM-5.1 coding production environment, achieving a significant performance optimization. Benchmark tests showed that with GPU, software stack, and applications remaining unchanged, the ZCube architecture reduced capital expenditure for switches and optical modules by 33%, increased average GPU inference throughput by 15%, and reduced first token latency (TTFT P99) by 40.6%, achieving a system-level breakthrough that balances high economic efficiency and high performance.

Currently, as long context inference and Prefill-Decode (PD) separation deployment have become industry standards, the cross-node transmission of KV Cache shows a high degree of asymmetry. Traditional ROFT (Rail-Optimized Fat-Tree) architectures based on multi-layer switch stacking suffer from static topology limitations, making them prone to local hotspots and PFC backpressure, creating a structural bottleneck characterized by "sufficient total bandwidth but frequent local congestion."

To address this pain point, the ZCube architecture breaks away from the hierarchical stacking approach of traditional Clos architecture, eliminating the Spine layer switches and using two groups of completely flat switches for bipartite graph interconnection, combined with a dual-port NIC's single/multi-track hybrid access mechanism. With its unique routing strategy, ZCube ensures that any GPU pair has a dedicated optimal path, achieving perfect traffic load balancing at the structural level and supporting ultra-large-scale expansion of tens of thousands or even hundreds of thousands of GPUs.

In the production environment transformation, the Yuchen Network team successfully overcame the challenges of cabling and route strategy reconstruction using automated control and verification tools, ensuring a fast and stable cluster upgrade. The current thousand-card cluster has been running stably for more than two weeks. The successful implementation of ZCube marks that intelligent computing infrastructure is moving from general interconnection to system collaboration driven by model traffic. In the future, the deep integration of network topology, communication libraries, and scheduling strategies will become the core driving force for further improving Token production efficiency and reducing MaaS overall costs.

Aliyun BaiLian Launches Major Upgrade: Full-Stack Open Access, Building an AI Model Supermarket

On May 20th, during the summit, Alibaba Cloud announced that its large model service platform "BaiLian" has strengthened its open ecosystem, integrating top third-party models from multiple companies, covering fields such as text, image, video, and multimodal generation. This move marks BaiLian's transformation from a showcase for Alibaba's self-developed Qianwen model into an AI model supermarket that includes mainstream models across industries. The first batch of integrated model portfolios is rich and diverse.

Breakthrough in OpenAI Inference Model: AI Successfully Refutes Erdős Unit Distance Conjecture

On May 20, 2026, OpenAI's reasoning model overturned Paul Erdős' 1946 'unit distance conjecture,' solving a core problem in discrete geometry that had persisted for nearly 80 years. Unlike previous AI achievements, this result gained widespread academic recognition, marking a key breakthrough from AI 'retrieval' to 'original creation.'....

Tencent AI Assistant Maavis Officially Launched, Supports Windows/Mac/Android Platforms

Tencent launches cross-platform AI assistant Maavis, supporting Windows, Mac, and Android systems, turning computers into intelligent dialogue assistants. Its core innovation lies in integrating terminal systems, files, applications, and computing power. It features six collaborative Agents forming an AI team, with a main Agent coordinating and scheduling, while specialized Agents work together to improve users' work and life efficiency.

Intelligent Future Launches 200B-Parameter Native Multimodal Image Large Model, Embarking on a New Journey from Generating Content to Understanding the World

ZhiXiang Future released HiDream-O1-Image-Pro, an image model based on Unified Transformer architecture with over 200 billion parameters, achieving multiple SOTA records at its Beijing Open Day. It also completed its second funding round within half a month, backed by top investors like Shenzhen Capital Group and Jinpu Investment, highlighting capital market recognition of native full-modal technology.....

Zigong and Tsinghua University Propose ZCube Networking Architecture: 15% Increase in Large Model Inference Throughput, One-Third Reduction in Network Costs

Related Recommendations

Key Breakthrough in Computing Power Enhancement: Peking University and DeepSeek Jointly Open-Source Large Model Inference Framework DSpark

Aliyun BaiLian Launches Major Upgrade: Full-Stack Open Access, Building an AI Model Supermarket

Breakthrough in OpenAI Inference Model: AI Successfully Refutes Erdős Unit Distance Conjecture

Tencent AI Assistant Maavis Officially Launched, Supports Windows/Mac/Android Platforms

Intelligent Future Launches 200B-Parameter Native Multimodal Image Large Model, Embarking on a New Journey from Generating Content to Understanding the World