On May 11, Mingshi Intelligence, in collaboration with Tsinghua University and the OpenBMB open-source community, officially launched the new edge-side multimodal large model MiniCPM-V4.6. This "lightweight" model, with only 1.3B parameters, has successfully challenged the performance ceiling of larger models through its extreme intelligent density and cross-platform adaptability, accelerating the practical application of edge-side AI.

1. Performance Peak: "Outstanding Performance" with 1.3B Parameters
MiniCPM-V4.6 introduced two versions: Instruct and Thinking, showing remarkable reasoning and understanding capabilities in various evaluations compared to models of the same size:
Global Leadership: On the Artificial Analysis (AA) list, MiniCPM-V4.6 achieved an excellent score of 13 points, not only significantly outperforming similarly sized competitors (such as Alibaba's Qwen3.5-0.8B and Google's Gemma4-E2B-it), but also approaching the performance of larger parameter models like Qwen3.5-2B, becoming a performance benchmark among 1B-level models.
Advanced Capabilities: Whether it is general image-text understanding, complex STEM mathematical reasoning, or challenging document OCR and video temporal understanding, the model demonstrates a high level of intelligence. Particularly in multi-image reasoning and hallucination suppression, the Thinking version performs exceptionally well.
2. Efficiency Revolution: Extreme "Intelligent Density" on the Edge
To address the "memory anxiety" in edge deployment, MiniCPM-V4.6 has undergone deep optimization in inference speed and resource usage:
Fast Threshold: The memory requirement has been reduced to 6GB, allowing mainstream smartphones, PCs, and smart home devices to run smoothly.
Inference Efficiency: Based on vLLM, the inference throughput reaches 1.5 times that of competitors; when processing a 3136² ultra-high-definition large image on the edge, the first response delay is only 75.7ms, which is 2.2 times faster than competitors.
Throughput Capability: A single card can achieve a text generation capability of 7013 token/s, and a 1344² image processing capacity of 54.79 images per second, with impressive efficiency performance.
3. Technical Core: LLaVA-UHD v4 Reduces Overhead
The reason why the model can "go light" is due to the jointly developed LLaVA-UHD v4 technology by Mingshi Intelligence and Tsinghua University:
Encoding Reconstruction: By reconstructing ViT image encoding and shallow compression modules, image encoding overhead is reduced by 50%, and high-resolution floating-point operations are reduced by 55.8%.
Hybrid Compression Mechanism: It innovatively supports 4x/16x hybrid token compression, allowing flexible switching between "performance priority" and "speed priority". This technology has already been verified in Kuaishou's recommendation large model OneRec, supporting massive traffic requests.
4. Ecological Implementation: From Lab to Industrial Frontline
The open source of MiniCPM-V4.6 is not only a technical victory but also an ecological one:
Easy Development: It is deeply compatible with micro-tuning frameworks such as ms-swift and LLaMA-Factory, allowing developers to perform full-scale tuning with a single RTX4090 GPU.
Full Platform Compatibility: It supports mainstream frameworks such as vLLM and Ollama, and provides test versions covering iOS, Android, and HarmonyOS, enabling AI to reach more forms of hardware terminals.
Implementation Empowerment: Currently, this series has been implemented in multiple fields such as automotive, PC, smart home, and industrial inspection, with partners including industry-leading enterprises such as Lenovo, Geely, SAIC Volkswagen, Xiaomi, and OPPO.
