JD.com has recently announced an exciting technological advancement - open-sourcing the xLLM inference engine, which is independently developed based on domestic chips. The launch of this engine aims to help enterprises operate with higher performance and lower costs when deploying artificial intelligence (AI) applications, further promoting the intelligent transformation of the industry.
The core features of the xLLM engine are numerous. First, it features a dynamic scheduler based on request priority, which can intelligently allocate computing resources according to the importance of different requests, ensuring critical tasks are completed first. Additionally, the engine has a dynamic adaptive PD separation architecture that can adjust the proportion of processing instances based on real-time load conditions, achieving optimal resource utilization. Notably, xLLM is specially adapted for multimodal scenarios, equipped with an EPD hybrid separation scheduler, providing a more flexible solution for complex AI applications.

In terms of technical architecture, xLLM has a multi-level pipeline execution engine based on hardware characteristics, ensuring efficient processing of different types of tasks. At the same time, developers have designed a computational optimization suite that includes graph fusion, speculative inference, and dynamic load balancing, which will significantly improve the efficiency of AI inference. To further enhance system performance, xLLM also uses Mooncake to build a multi-level KV cache global management system, providing a smoother experience for data processing.
JD.com stated that this technology originates from its retail core business and has been successfully applied in multiple scenarios such as JD AI Assistant, intelligent customer service, risk control, and supply chain assistant. Through this system, efficiency has increased by more than five times, while machine costs have decreased by 90%, all of which demonstrate the great potential of xLLM.
"We firmly believe that the construction of an AI infrastructure ecosystem cannot be separated from the contributions of every developer. Open-sourcing is just the first step. In the future, JD will continue to open up more advanced features according to community needs and work with research and industry partners such as Tsinghua University, Peking University, and USTC to promote the innovation and development of domestic AI infrastructure technology," said the JD Retail AI Infrastructure team.
With the open-sourcing of xLLM, developers will have the opportunity to experience this powerful inference engine and contribute to the development of China's AI technology ecosystem.
