The "deep water zone" of long video understanding has finally welcomed an authoritative evaluation standard. On March 2, 2026, the Long Video Retrieval (LoVR) benchmark, proposed jointly by and , was officially accepted by the top international conference WWW 2026 (The Web Conference).
This achievement fills a gap in the industry for real long-form video (Long-form Video) multi-granularity retrieval evaluation.
Core Breakthroughs: Solving the "Three Major Mountains" of Long Video Retrieval
Traditional video retrieval benchmarks are mostly limited to short videos (such as TikTok-style), which are difficult to cope with complex semantic associations in long videos. The emergence of LoVR specifically addresses three major pain points:
Full Granularity Coverage: It supports both Video-level macro retrieval and Clip-level micro-precise positioning, meeting diverse needs from "searching for the entire movie" to "searching for a specific moment."
Scalable High-Quality Annotation: A novel annotation pipeline is provided, combining automated generation by large models (VLM), automatic quality scoring, and dynamic correction, achieving cost-effective and scalable high-quality multimodal data construction.
Modeling Real-World Scenarios: Systematically captures real challenges in long video retrieval, such as long-range semantic drift and high information density.
Technical Specifications: Supporting Over 40,000 Fine-Grained Clips
Massive Data: Includes 467 real long videos, with an average duration of over 25 minutes.
Fine-Grained Slicing: Generates over 40,804 fine-grained clips, each accompanied by high-quality text descriptions (Captions) verified by both human and machine.
Semantic Fusion Technology: Introduces semantic fusion methods to ensure no key contextual information is lost when generating full-video summaries, providing a unified evaluation platform for long-range semantic modeling.
Industry Insights: University-Enterprise Collaboration Promotes AGI Implementation
As a representative of the university-enterprise joint research between and Peking University, the selection of LoVR indicates that leading domestic database companies are moving from single storage and computing to the cutting-edge field of "Vector Retrieval + Multimodal Understanding." With the explosion of long videos in fields such as streaming media, surveillance, and online education, the multi-granularity retrieval standards provided by LoVR will become an important cornerstone for future video search engines and AI editing assistants to achieve "reliability."
