Ant Group Releases Benchmark for Large Model Evaluation in the DevOps Field


Ant Group first showcased its 'Data+AI' as the core, demonstrating a full-stack layout from underlying technology to industrial applications at the 9th Digital China Construction Summit, marking a new stage in its data strategy - the 'Intelligent and Trustworthy Data Flow'. By integrating large models into daily scenarios, AI tools are implemented in practice. The medical AI application 'Ant A Fu' has served over 100 million users, and has collaborated with the Fuzhou Health Commission.
Ten hours after the release of DeepSeek-V4, the DCAI team from Peking University quickly generated a comprehensive automated evaluation report using the newly released open-source One-Eval evaluation framework. Traditional large model evaluation processes are cumbersome, requiring significant effort in setting up testing pipelines. One-Eval significantly improves efficiency, marking a new stage in the industry.
Ant Group won the championship in the "Robustness Sample Testing in Complex Real-World Scenarios" and "Facial Enhancement Anomaly Detection" tracks at the CVPR 2026 NTIRE Challenge. This achievement helps enhance risk identification capabilities in scenarios such as payment, content review, and financial authentication. In response to the increasing challenges of deepfakes and misuse of AIGC, as well as the insufficiency of detection models in real-world scenarios and multi-modal large model iterations, this breakthrough provides important technical support.
Ant AI Security Lab audited OpenClaw, finding 33 vulnerabilities. The latest version fixed 8, including 1 critical, 4 high, and 3 medium risks. Ant Group will continue monitoring security to support safe AI agent applications.....
Ant Group and Shanghai Jiao Tong University jointly released the F2LLM-v2 series of Embedding models, aiming to break the English-centric bias in the field of semantic representation. The model swept 11 SOTA rankings in the authoritative MTEB benchmark, demonstrating dominant performance. As a fully open-source solution, it combines high performance with extreme efficiency, providing global developers with advanced semantic representation tools.