Article Content

AMD Launches vLLM-ATOM Plugin to Deeply Optimize the Inference Performance of Domestic Large Models

Published in Latest AI News

Time :May 12, 2026

Read :3minute

Recently, AMD officially released a new plugin called vLLM-ATOM. This tool's core mission is to significantly tap into hardware potential while keeping existing workflows unchanged, achieving significant acceleration in the inference process for mainstream large language models such as DeepSeek-R1, Kimi-K2, and gpt-oss-120B.

For developers, vLLM is an open-source framework designed to optimize throughput and GPU memory utilization in high-concurrency scenarios. Unlike traditional single-call tools, it focuses more on request scheduling and cache management. The ATOM plugin introduced by AMD this time is a deeply customized solution specifically designed for Instinct GPUs. Its biggest highlight is "seamless migration": enterprise users do not need to modify existing API interfaces, commands, or end-to-end operational processes; the plugin can automatically take over and complete the underlying performance optimization in the background.

From a technical architecture perspective, vLLM-ATOM adopts a precise three-layer design. The top layer continues to use vLLM's request scheduling and compatibility interface; the middle layer's ATOM plugin is responsible for model implementation and kernel optimization; while the bottom layer AITER directly connects to the GPU hardware, providing core acceleration capabilities including Flash Attention, quantized GEMM, and fused MoE.

This plugin mainly targets high-performance GPU computing cards such as Instinct MI350, MI400, and MI355X. In the support list, it not only includes star models like Qwen3, GLM, and DeepSeek, but also achieves full coverage of various architectures including MoE (Mixture of Experts), dense models, and vision-language models (VLM).

Industry analysts point out that the core value of this solution lies in greatly reducing the deployment barriers of high-performance computing power. Through this "zero-learning-cost" smooth migration solution, enterprises can more easily switch AI services to the AMD hardware backend, maintaining inference efficiency while effectively improving the stability and response speed of large model online services.

Related Recommendations

Accelerating Domestic Large Models: AMD Launches vLLM-ATOM Plugin to Significantly Improve Inference Efficiency

AMD launched the vLLM-ATOM plugin, optimizing large language model deployment on AMD hardware. It boosts inference performance for Chinese models like DeepSeek-R1 and Kimi-K2 without altering existing workflows. Tailored for Instinct GPUs, it leverages vLLM's high memory efficiency, enabling low-cost technical migration and smooth performance upgrades.....

May 12, 2026

239.9k

OpenAI Joins NVIDIA and Other Giants to Release MRC Protocol, Redefining Large-Scale AI Training Network Architecture

OpenAI has partnered with five major companies, including AMD, Broadcom, Intel, Microsoft, and NVIDIA, to launch the Multi-Path Reliable Connection (MRC) protocol, aimed at addressing network latency and failure issues in large-scale AI training. The protocol has been open-sourced through the Open Compute Project (OCP) and is driving a shift from a three-tier architecture to a two-tier design, breaking single points of failure and improving training stability and efficiency.

May 7, 2026

230.4k

AMD: The Rise of CPU May Surpass GPU in the Era of Proxy AI

AMD CEO Lisa Su noted in the Q1 2026 earnings call that the era of agentic AI is driving rapid growth in data center CPU demand. The traditional 'one CPU to multiple GPUs' model is shifting toward near one-to-one CPU-to-GPU ratios, with CPUs potentially surpassing GPUs in the future. CPUs are evolving from a primary scheduling role to a more central computing node, fueling data center architecture transformation.....

May 6, 2026

687.8k

AMD Invests $250 Million in Nutanix to Co-Create an AI Infrastructure Platform

AMD and Nutanix form a $250M strategic partnership to develop a full-stack AI infrastructure platform, enhancing on-premises AI deployment and building an ecosystem to compete with Nvidia.....

Feb 26, 2026

164.9k

Spending 100 Billion Dollars! Meta Strikes the Largest Chip Order in History with AMD, Targeting NVIDIA's Dominance

After purchasing chips from NVIDIA, Meta has signed a multi-year agreement with AMD worth over 100 billion dollars, purchasing a total of 6 gigawatts of AI computing power, aiming to diversify its computing power strategy and build an moat, setting a new record for transactions in the AI chip market.

Feb 25, 2026

154.6k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご