Meta Launches Innovative Model AU-Nets to Revolutionize Text Processing

In the field of large language models (LLMs), text data segmentation has always been a key research direction. Traditional tokenization techniques, such as Byte Pair Encoding (BPE), typically split text into fixed units before processing and build a static vocabulary based on this. Although widely used, this approach has limitations. Once tokenization is completed, the model's processing method cannot be flexibly adjusted, and its performance is particularly unsatisfactory when dealing with low-resource languages or special character structures.

To address these issues, Meta's research team introduced an innovative architecture called AU-Net. AU-Net changes the traditional text processing model by using a self-regressive U-Net structure, allowing it to learn directly from raw bytes, flexibly combining bytes into words and phrases, and even forming combinations of up to four words, creating multi-level sequence representations.

The design of AU-Net is inspired by the U-Net architecture in the field of medical image segmentation, featuring a unique contraction path and expansion path. The contraction path is responsible for compressing the input byte sequence, merging it into higher-level semantic units to extract the macro semantic information of the text. The expansion path then gradually restores this high-level information back to the original sequence length, while integrating local details, enabling the model to capture key features of the text at different levels.

The contraction path of AU-Net is divided into multiple stages. In the first stage, the model directly processes raw bytes, using a limited attention mechanism to ensure computational feasibility. In the second stage, the model performs pooling at word boundaries, abstracting byte information into word-level semantic information. In the third stage, pooling operations are performed between every two words, capturing broader semantic information and enhancing the model's understanding of the text's meaning.

The expansion path is responsible for gradually restoring the compressed information, using a multilinear upsampling strategy that allows each position's vector to adjust according to its relative position in the sequence, optimizing the fusion of high-level information and local details. In addition, the design of skip connections ensures that important local detail information is not lost during the restoration process, thereby improving the model's generation capability and prediction accuracy.

During the inference phase, AU-Net adopts a self-regressive generation mechanism, ensuring that the generated text is coherent and accurate while improving inference efficiency. This innovative architecture provides new insights for the development of large language models, demonstrating stronger flexibility and applicability.

Open source address: https://github.com/facebookresearch/lingua/tree/main/apps/aunet

Key points:
- 🚀 The AU-Net architecture dynamically combines bytes through a self-regressive approach to form multi-level sequence representations.
- 📊 It uses contraction and expansion paths to ensure effective integration of macro semantic information and local details.
- ⏩ The self-regressive generation mechanism improves inference efficiency and ensures the coherence and accuracy of text generation.

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

According to TMR Research, the global artificial intelligence chipset market size is expected to exceed $700 billion, with a compound annual growth rate of 31.8% from 2022 to 2031. The article discusses the development trends, application areas, and key players in the artificial intelligence chipset market, which is highly timely and valuable for readers interested in the artificial intelligence chipset market.

IBM Research: How AI & Automation Protect Businesses from Data Breaches

IBM's report provides sufficient evidence that artificial intelligence, automation, and threat intelligence can address data breaches throughout the lifecycle, reduce costs, and provide stronger evidence. The research found that integrating artificial intelligence and automation into security operations teams can reduce the lifecycle of data breaches by 33% and costs by 33.6%. However, currently, only 28% of enterprises widely apply artificial intelligence and automation. Many enterprises rely on legacy systems, which are easily bypassed by attackers. The significance of this article lies in emphasizing the effectiveness of artificial intelligence and automation in improving cybersecurity and calling on enterprises to widely adopt these technologies to protect data security.

Google's AGI Robot Breakthrough: 54 - Member Team's 7 - Month Work, High Generalization and Reasoning 解释：核心关键词为“谷歌AGI机器人”（Google's AGI Robot）和“新成果”（Breakthrough），标题简洁地概括了主要内容，以动词开头，符合英文习惯，且长度在规定范围内。

The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Meta Intelligence OS is a startup founded by Bloomberg. It has developed a series of large models based on the open-source model RWKV and aims to become the Android in the era of large models. The RWKV model has superior performance and low cost in inference tasks, thus attracting customers from industries such as finance, law firms, and smart hardware. The business model of Meta Intelligence OS is model customization based on private data and internal AI Agent development. The company hopes to solve the problems of API call latency and data security by deploying large models on terminal devices. Currently, RWKV versions are available on Windows, Mac, and Linux computers, and Android and iOS versions are also in development. Meta Intelligence OS is raising funds and collaborating with chip companies and computing power platforms to create benchmark customers. Luo Xuan said that the decisive battlefield for large models is on hardware, and both terminal devices and the cloud require dedicated chips.

China's First Large Model Approved by Chief Physicians is Now Available on夸克AI Search

The夸克 Health Large Model has passed the written evaluation of 12 core medical disciplines by chief physicians in China, becoming the first large model to reach this level domestically. The model has been integrated into夸克AI Search, allowing users to access chief physician-level AI medical services through deep search. Its core breakthrough lies in establishing a slow-thinking capability, processing complex medical issues through chain reasoning and clinical deduction path modeling.夸克 adopts a dual data production line + dual reward mechanism training system, with a team of thousands of physicians labeling the data, including over 400 deputy chief physicians and above. The platform currently has more than 20 million monthly active medical student users.

Meta Launches Innovative Model AU-Nets to Revolutionize Text Processing

Related Recommendations

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

IBM Research: How AI & Automation Protect Businesses from Data Breaches

RWKV: Small Team Aims to Be Android of AI Era with Big Model

China's First Large Model Approved by Chief Physicians is Now Available on夸克AI Search