Meituan LongCat Launches Innovative Benchmark Test UNO-Bench to Enhance Multimodal Large Language Model Evaluation Capabilities

Recently, the LongCat team from Meituan launched a new benchmark called UNO-Bench, aiming to systematically evaluate these models' understanding capabilities across different modalities. This benchmark covers 44 task types and five modality combinations, striving to comprehensively demonstrate the performance of models in both single-modal and full-modal scenarios.

The core of UNO-Bench lies in its rich dataset. The team carefully selected 1,250 full-modal samples, which have a cross-modal solvability of 98%. In addition, 2,480 enhanced single-modal samples were added. These samples fully consider real-world applications, especially performing exceptionally well in the Chinese context. Notably, after automatic compression processing, the runtime speed of these datasets increased by 90%, maintaining a consistency of up to 98% across 18 public benchmarks.

To better evaluate the complex reasoning ability of models, UNO-Bench also introduced an innovative multi-step open-ended question format. This format combines a general scoring model that can automatically evaluate six different question types with an accuracy rate of an impressive 95%. This innovative evaluation method undoubtedly provides new insights for evaluating multimodal models.

Currently, UNO-Bench mainly focuses on the Chinese scenario. The team states that they are actively seeking partners and plan to jointly develop English and multilingual versions. Interested developers can download the UNO-Bench dataset through the Hugging Face platform, and related code and project documentation are also publicly available on GitHub.

With the release of UNO-Bench, the evaluation standards for multimodal large language models will be further improved. This not only provides researchers with powerful tools but also paves the way for the advancement of the entire industry.

Project address: https://meituan-longcat.github.io/UNO-Bench/

Meituan LongCat Launches Innovative Benchmark Test UNO-Bench to Enhance Multimodal Large Language Model Evaluation Capabilities

Related Recommendations

xAI Requires Employees to Submit Biometric Data to Train a Virtual Girlfriend, Sparking Strong Opposition Internally

iFLYTEK Launches Fully Domestic Computing Power Starfire X1.5 AI Technology Upgrade

London High Court Rules AI Image Generator Stable Diffusion Does Not Constitute Infringing Copying

Google Gemini 3 Pro Preview Appears in Vertex AI: Supports a Million-Level Context Window

SoftBank and OpenAI Collaborate to Establish a Joint Venture in Japan to Launch Enterprise-Level AI Solutions