The top AI conference NeurIPS 2025 announced the award winners tonight. The Tongyi Qianwen team from Alibaba won the Best Paper Award with "Attention Gating Makes Better Foundation Models," becoming the only Chinese entry among the four winning papers. This year's conference received 20,000 submissions, with an acceptance rate of only 25%, making the competition the most intense in history.

The core of the paper is a "sliding door": adding a learnable gate after standard attention to decide in real time which heads and tokens continue to participate in downstream computations. Experiments show that a 1.7B dense model and a 15B MoE model trained on 3.5T tokens achieved a 1% increase in parameters, a 0.2 decrease in perplexity, and a 2-point increase in MMLU. Consistent improvements were observed across all subdomains of the Pile. The team explained that the gate acts like an "security check" for attention, blocking irrelevant information before it reaches the FFN, thus improving both computational efficiency and robustness.

This mechanism has been integrated into the upcoming Qwen3-Next. Alibaba also open-sourced the code and the 1.7B experimental model on GitHub for community re-validation. Tongyi Qianwen stated that the next step will be to extend the gating idea to multimodal and long-text scenarios, making "attention that can filter itself" a standard component of next-generation large models.