Recently, the official website of Tencent Cloud has officially launched the API service for the Tencent Hunyuan A13B model. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked a heated response in the developer community.

As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a compact design with a total parameter count of 80B and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while offering faster inference speed and significantly improved cost-effectiveness. This innovation not only lowers the barrier for developers to access advanced model capabilities but also lays a solid foundation for the widespread adoption of AI applications.

The Hunyuan-A13B model, based on an advanced architecture, demonstrates strong general capabilities. It has achieved excellent results on multiple authoritative industry data sets, particularly excelling in Agent tool calls and long text processing. To further enhance Agent capabilities, the Tencent HuanYuan team has built a multi-Agent data synthesis framework. By integrating environments such as MCP, sandbox, and large language model simulation, and using reinforcement learning technology, the Agent can autonomously explore and learn in various environments, significantly improving the model's practicality and effectiveness.

WeChat screenshot_20250711161137.png

In terms of long text processing, Hunyuan-A13B supports a native context window of 256K and maintains excellent performance across multiple long text datasets. In addition, the model innovatively introduces a fused reasoning mode, allowing users to freely switch between fast thinking and slow thinking modes according to task requirements. This ensures output efficiency while maintaining accuracy for specific tasks, achieving optimized allocation of computing resources.

For individual developers, the Hunyuan-A13B model is highly user-friendly. Under strict conditions, it can be deployed with just one mid-range GPU card. Currently, the model is seamlessly integrated into the open-source mainstream inference framework ecosystem, supporting various quantization formats. Moreover, with the same input and output scale, its overall throughput is more than twice that of cutting-edge open-source models, demonstrating its outstanding performance and flexibility.

The success of the Hunyuan-A13B model is attributed to the innovative technologies of the Tencent HuanYuan team in both pre-training and post-training stages. During the pre-training phase, the team trained a corpus of up to 20T tokens, covering multiple fields, significantly enhancing the model's general capabilities. At the same time, through systematic analysis and modeling verification, the team also constructed a joint formula for the Scaling Law applicable to the MoE architecture, providing quantitative engineering guidance for MoE architecture design. In the post-training phase, a multi-stage training approach was used to further improve the model's reasoning ability and generality.

As one of the largest large language models used internally within Tencent, Hunyuan-A13B has been widely applied in over 400 business scenarios, with daily request volumes exceeding 130 million, fully demonstrating its value and stability in real-world applications.

https://cloud.tencent.com/product/tclm