Ant Group open-sources Ming-lite-omni: The first open-source multimodal model comparable to GPT-4o

The multi-modal large model Ming-lite-omni from the Bai Ling team of Ant Group recently announced a significant decision at the recent Ant Technology Day: to fully open-source the model. This move not only marks another major openness initiative by Ant Group in the AI field but is also considered by the industry as the first open-source model that can rival GPT-4o in terms of modality support.

22 billion parameters technical breakthrough

Ming-lite-omni is based on Ling-lite and adopts an advanced MoE (Expert Mix) architecture, with a total of 22 billion parameters and 3 billion active parameters. This parameter scale has reached new heights among open-source multi-modal models, showcasing Ant Group's deep accumulation in large model technology.

Currently, the model weights and inference code for Ming-lite-omni have been fully opened to the public, and the training code and training data will be released in subsequent stages, providing comprehensive technical support for global developers.

Ongoing open-source strategy shows results

This year, the Bai Ling large model team has continuously open-sourced several important model products, including Ling-lite, Ling-plus, and other large language models, the multi-modal large model Ming-lite-uni, as well as the preview version Ming-lite-omni.

In mid-May, the Ling-lite-1.5 version open-sourced had capabilities close to the same level SOTA, with performance between 4B and 8B of Qwen, successfully verifying the feasibility of training a 300B size SOTA MoE large language model on non-high-end computing platforms.

Performance comparable to international top-tier models

In multiple understanding and generation capability evaluations, Ming-lite-omni's performance is equivalent to or better than leading multi-modal large models of the 10B scale. Ant Group stated that this is known as the first open-source model that can rival GPT-4o in terms of modality support, providing important technical options and reference standards for global developers.

Bai Ling large model leader Xiting introduced the team's technical route: "We firmly use the MoE architecture in both language large models and multi-modal large models and extensively utilize non-high-end computing platforms, successfully demonstrating the ability of domestic GPUs to train models comparable to GPT-4o."

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

According to TMR Research, the global artificial intelligence chipset market size is expected to exceed $700 billion, with a compound annual growth rate of 31.8% from 2022 to 2031. The article discusses the development trends, application areas, and key players in the artificial intelligence chipset market, which is highly timely and valuable for readers interested in the artificial intelligence chipset market.

IBM Research: How AI & Automation Protect Businesses from Data Breaches

IBM's report provides sufficient evidence that artificial intelligence, automation, and threat intelligence can address data breaches throughout the lifecycle, reduce costs, and provide stronger evidence. The research found that integrating artificial intelligence and automation into security operations teams can reduce the lifecycle of data breaches by 33% and costs by 33.6%. However, currently, only 28% of enterprises widely apply artificial intelligence and automation. Many enterprises rely on legacy systems, which are easily bypassed by attackers. The significance of this article lies in emphasizing the effectiveness of artificial intelligence and automation in improving cybersecurity and calling on enterprises to widely adopt these technologies to protect data security.

Google's AGI Robot Breakthrough: 54 - Member Team's 7 - Month Work, High Generalization and Reasoning 解释：核心关键词为“谷歌AGI机器人”（Google's AGI Robot）和“新成果”（Breakthrough），标题简洁地概括了主要内容，以动词开头，符合英文习惯，且长度在规定范围内。

The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Meta Intelligence OS is a startup founded by Bloomberg. It has developed a series of large models based on the open-source model RWKV and aims to become the Android in the era of large models. The RWKV model has superior performance and low cost in inference tasks, thus attracting customers from industries such as finance, law firms, and smart hardware. The business model of Meta Intelligence OS is model customization based on private data and internal AI Agent development. The company hopes to solve the problems of API call latency and data security by deploying large models on terminal devices. Currently, RWKV versions are available on Windows, Mac, and Linux computers, and Android and iOS versions are also in development. Meta Intelligence OS is raising funds and collaborating with chip companies and computing power platforms to create benchmark customers. Luo Xuan said that the decisive battlefield for large models is on hardware, and both terminal devices and the cloud require dedicated chips.

Qwen Releases OmniAudio, Which Can Generate Spatial Audio from 360-Degree Videos

Recently, the Speech Team of Qwen Laboratory has made a milestone achievement in the field of spatial audio generation and launched the OmniAudio technology. This technology can directly generate FOA (First-order Ambisonics) audio from 360-degree videos, bringing new possibilities to virtual reality and immersive entertainment. Spatial audio, as a technology that simulates real auditory environments, can enhance immersive experiences. However, existing technologies are mostly based on fixed perspective videos and underutilize the spatial information of 360-degree panoramic videos. Traditional video-to-audio generation