Adobe and Universities Launch METAL Framework: Multi-Agent Collaboration for Precise Chart Generation

Generating charts that accurately reflect complex data remains a subtle challenge in today's data visualization field. Charts require not only precise layout, color, and text placement, but also the translation of these visual details into code to reproduce the intended design. Traditional methods often rely on directly prompting visual-language models (VLMs), such as GPT-4V, which frequently struggle to translate complex visual elements into syntactically correct Python code. Even minor errors can lead to charts failing to meet design goals, a significant issue in areas like financial analysis, academic research, and educational reporting.

To address this, researchers from UCLA, UC Merced, and Adobe Research have introduced a novel framework called METAL. This system decomposes the chart generation task into a series of focused steps managed by specialized agents.

The METAL framework comprises four key agents: a generation agent, a visual assessment agent, a code assessment agent, and a revision agent. The generation agent is responsible for the initial Python code generation. The visual assessment agent evaluates how well the generated chart matches the reference chart. The code assessment agent reviews the generated code for any syntax or logical errors. Finally, the revision agent adjusts the code based on the assessment feedback.

METAL's modular design is a key strength. By assigning visual interpretation and code generation tasks to different agents, each agent can focus on its specific function. This approach ensures that both the visual and technical aspects of the chart are fully considered and refined, leading to improved accuracy and consistency in chart generation.

Experiments on the ChartMIMIC dataset showed METAL outperforming traditional methods in terms of text clarity, chart type accuracy, color consistency, and layout precision. Comparisons with the open-source model LLAMA3.2-11B and the closed-source model GPT-4O demonstrated that METAL generated charts closer in accuracy to the reference charts.

Furthermore, ablation studies highlighted the importance of separating visual and code assessment mechanisms. When these two components were merged into a single assessment agent, performance often degraded, indicating that specialized assessment methods are crucial for high-quality chart generation.

METAL offers a balanced multi-agent approach by breaking down the task into specialized, iterative steps. This approach not only facilitates the precise translation of visual designs into Python code but also provides a systematic process for error detection and correction. With increased computational resources, METAL's performance shows near-linear improvement, offering practical potential for applications demanding high accuracy.

Project: https://metal-chart-generation.github.io/

Key Highlights:
🌟 The METAL framework, developed jointly by UCLA, UC Merced, and Adobe, aims to optimize the chart generation process.
🔍 The framework incorporates four specialized agents responsible for generating, assessing, and revising charts, ensuring proper handling of visual and technical elements.
📈 Experimental results demonstrate that METAL surpasses traditional methods in chart generation accuracy and consistency, showcasing promising practical potential.

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

The Alibaba Tongyi Qianwen team has launched two lightweight models in the Qwen3-VL series, with parameter scales of 4B and 8B. This series is the strongest family of vision-language models to date, adding small-parameter versions to lower deployment barriers while maintaining strong performance. Each scale offers two versions: instruction following and chain-of-thought reasoning, providing developers with more flexible options.

Google Launches New Vision-Language Model PaliGemma 2 Mix Integrating Multiple Functions to Aid Developers

Recently, Google announced the release of a brand new Vision-Language Model (VLM) called PaliGemma 2 Mix. This model combines image processing and natural language processing capabilities, allowing it to understand visual information and text input simultaneously, generating corresponding outputs as needed. This marks a significant breakthrough in artificial intelligence technology for multi-task processing. PaliGemma 2 Mix boasts powerful features, integrating image description and optical character recognition.

Google's AGI Robot Breakthrough: 54 - Member Team's 7 - Month Work, High Generalization and Reasoning 解释：核心关键词为“谷歌AGI机器人”（Google's AGI Robot）和“新成果”（Breakthrough），标题简洁地概括了主要内容，以动词开头，符合英文习惯，且长度在规定范围内。

The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Meta Intelligence OS is a startup founded by Bloomberg. It has developed a series of large models based on the open-source model RWKV and aims to become the Android in the era of large models. The RWKV model has superior performance and low cost in inference tasks, thus attracting customers from industries such as finance, law firms, and smart hardware. The business model of Meta Intelligence OS is model customization based on private data and internal AI Agent development. The company hopes to solve the problems of API call latency and data security by deploying large models on terminal devices. Currently, RWKV versions are available on Windows, Mac, and Linux computers, and Android and iOS versions are also in development. Meta Intelligence OS is raising funds and collaborating with chip companies and computing power platforms to create benchmark customers. Luo Xuan said that the decisive battlefield for large models is on hardware, and both terminal devices and the cloud require dedicated chips.

AMD Unveils New Radeon RX 9070 Series Graphics Cards with Significant Performance Boost, Rivaling RTX 50

Meeting the high expectations of professionals and gamers alike, AMD recently launched the new Radeon RX 9070 and 9070 XT graphics cards. These cards boast a remarkable 20% to 40% performance increase over their predecessors and are slated for release on March 6th. AMD first showcased these new products at CES in January. Note: Image from official website screenshot. The Radeon RX 9000 series utilizes the advanced RDNA4 graphics architecture, supporting...

Adobe and Universities Launch METAL Framework: Multi-Agent Collaboration for Precise Chart Generation

Related Recommendations

Alibaba Tongyi Qianwen Launches Qwen3-VL Lightweight Model: 4B and 8B Parameter Versions Performance Approaches Previous 72B Flagship

Google Launches New Vision-Language Model PaliGemma 2 Mix Integrating Multiple Functions to Aid Developers

RWKV: Small Team Aims to Be Android of AI Era with Big Model

AMD Unveils New Radeon RX 9070 Series Graphics Cards with Significant Performance Boost, Rivaling RTX 50