At today's Alibaba Cloud Tongyi Intelligent Hardware Exhibition, Alibaba Cloud officially launched a multimodal interaction development kit, aiming to provide intelligent hardware manufacturers with an "out-of-the-box" AI capability foundation. The kit integrates the three core large models of Tongyi Qianwen and pre-installs more than ten AI Agents and MCP (Model-as-a-Service) tailored for scenarios such as lifestyle and work efficiency, enabling rapid empowerment of terminal devices like AI glasses, learning machines, companion toys, and smart robots, significantly lowering the barrier to hardware intelligence.
Integration of Three Models to Build Intelligent Terminals That Can Hear, See, and Express
The core advantage of this development kit lies in its native integration of multimodal capabilities:
- Tongyi Qianwen: provides strong text understanding and generation, task planning, and dialogue logic;
- Tongyi Wanxiang: supports text-to-image, image-to-image, visual understanding, and style transfer, enhancing visual interaction;
- Tongyi Bailin: focuses on speech recognition, speech synthesis, and voiceprint identification, enabling natural speech interaction.
Together, these capabilities allow hardware devices to process voice commands, image inputs, and text contexts simultaneously, achieving complex multimodal tasks such as "taking a photo of a problem and explaining the solution steps" or "describing a scene you want to draw, generating an image, and reading it aloud."
Pre-installed Agents + MCP Tools to Accelerate Scenario Implementation
To improve development efficiency, the kit includes more than ten AI Agents and MCP tools that can be directly called, covering high-frequency scenarios:
- Learning Companion: homework tutoring agent, knowledge point questions and answers, English speaking practice;
- Life Assistant: schedule management, health reminders, smart home control;
- Creative Entertainment: AI painting assistant, story generator, music creation tools;
- Work Efficiency: meeting note generation, document summary, multilingual real-time translation.
Hardware manufacturers do not need to train models from scratch. They can simply integrate via API or SDK and give their products "human-like" interaction capabilities within a few weeks.
Comprehensive Openness, Helping Hardware Manufacturers Seize the AI Terminal Trend
Alibaba Cloud emphasized that the kit supports private deployment and cloud-edge collaboration, ensuring data security and response speed, and is suitable for devices with different computing power levels. At the same time, Alibaba Cloud will provide hardware reference designs, testing certification, and ecosystem connection services to help partners quickly bring products to market.
"In the future, every intelligent device should have multimodal interaction capabilities," said the head of Alibaba Cloud's intelligent hardware division. "Our goal is to let developers focus on product innovation rather than underlying model training."
AIbase Observation: Large Model Providers Are Shifting from 'API Output' to 'Hardware Empowerment'
Against the backdrop of the AI terminal boom, Alibaba Cloud's move marks a shift in its strategic focus from providing general APIs to deeply embedding itself in the hardware supply chain. By packaging the Tongyi large models into modular and scenario-based development kits, Alibaba Cloud not only expands the application scenarios of the models but also secures an early position in emerging markets such as AI glasses, educational hardware, and companion robots.
When the "Tongyi Family Pack" becomes the "AI hub" of intelligent hardware, Alibaba Cloud is trying to build an intelligent ecosystem based on large models, with hardware as touchpoints and scenarios as closed loops. And this wave of hardware intelligence ignited by the development kit has just begun.
