Retrieval-Augmented Generation (RAG) technology has made significant breakthroughs. Developed jointly by the THUNLP Lab at Tsinghua University, the NEUIR Lab at Northeastern University, OpenBMB, and AI9Stars, UltraRAG2.1 has been officially released, becoming the world's first open-source RAG framework based on the Model Context Protocol (MCP) architecture. This version greatly simplifies the construction process of multimodal intelligent retrieval systems—researchers can build multi-stage reasoning, generation, and evaluation with just a few lines of YAML configuration files, without writing a single line of code, significantly lowering the technical barrier.

Three Core Upgrades Define the Next Generation RAG Standard

Native Multimodal Support, Closing the Text-Image Retrieval Loop

UltraRAG2.1 includes an integrated Retriever-Generation-Evaluation pipeline that not only supports text but also handles multimodal data such as images and PDFs. Its innovative VisRAG Pipeline can directly parse local PDF documents, automatically extract text and charts, build cross-modal indexes, and enable "image-to-text" and "text-to-image" hybrid retrieval, suitable for high-value scenarios such as scientific paper analysis and technical manual Q&A.

image.png

Automatic Knowledge Base Construction, Deep Integration with MinerU

The framework supports smart parsing and semantic chunking of multiple formats, including Word, PDF, and Markdown, and seamlessly integrates the open-source document processing tool MinerU to build enterprise-grade private knowledge bases in one click. Users do not need to manually clean or annotate data; the system automatically completes structured processing, significantly increasing knowledge management efficiency.

Unified Workflow + Standardized Evaluation, Results are Explainable and Optimizable

UltraRAG2.1 provides a full-chain visual RAG workflow, compatible with various retrieval engines (such as Elasticsearch, FAISS) and generation models (Llama, Qwen, Kimi, etc.), and introduces a standardized evaluation system to quantify result quality from dimensions such as relevance, fidelity, and fluency. Developers can intuitively identify bottlenecks and quickly iterate and optimize.

image.png

MCP Architecture: Making RAG Truly "Composable and Scalable"

Differing from traditional RAG's hard-coded approach, UltraRAG2.1 is based on the Model Context Protocol (MCP), decoupling modules such as retrieval, reasoning, and generation into standardized "intelligent agents." With just a few lines of YAML declarative configuration, complex task flows can be flexibly assembled. For example, just a few lines of configuration can implement a three-stage workflow: "first retrieve technical documents → then call a code generation model → finally use an evaluation module to verify the output."

AIbase believes that the release of UltraRAG2.1 marks a shift in RAG technology from "tool assembly" to "engineering paradigm." When multimodal understanding, knowledge construction, and performance evaluation are unified within a lightweight, open-source, low-code framework, enterprises and researchers will be able to more efficiently apply large model capabilities to real-world business scenarios. This technological innovation led by the Chinese community is injecting new momentum into the global RAG ecosystem.