MimicGen: Synthetic Data Fuels AI Imitation Learning


In a recent live conversation, Tesla and SpaceX CEO Elon Musk stated that the real-world data available for training artificial intelligence models is nearly exhausted. He was in discussion with Mark Penn, Chairman of Stagwell. Musk mentioned, "We have basically consumed all the accumulated knowledge of humanity... for AI training data. This phenomenon essentially occurred last year." Musk's viewpoint aligns with that of former OpenAI Chief Scientist Ilya Sutskever.
The rapid development of deep learning relies on the scale of datasets, models, and computational power. In the fields of natural language processing and computer vision, researchers have already discovered a power-law relationship between model performance and data scale. However, in the field of robotics, especially in robot manipulation, such scalable patterns have yet to be established. A research team from Tsinghua University recently published a paper exploring data scaling laws in robot imitation learning and proposed an efficient data collection strategy that collected sufficient data in just one afternoon, enabling...
The Massachusetts Institute of Technology (MIT) showcased a brand new robot training model this week that abandons the previous focus on training methods centered around specific datasets, opting instead for an approach that utilizes vast amounts of information similar to that used in training large language models (LLMs). Researchers noted that imitation learning—where agents learn by mimicking individuals performing tasks—can fail when facing minor challenges. These challenges may include varying lighting conditions, different environmental setups, or new obstacles. In such cases, the robots lack sufficient data to adapt to these changes.
This week, MIT showcased a new model for training robots, aimed at addressing the challenges faced by imitation learning when introduced to small obstacles. Researchers pointed out that imitation learning can fail in conditions such as varying lighting, different environments, or new obstacles because robots simply do not have enough data to adapt. The team sought powerful data methods, like those used in models such as GPT-4, to tackle the problem. They introduced a new architecture called the Heterogeneous Pretrained Transformer (HPT), which aggregates information from different sensors and environments.
Meta AI's latest project, Llama 3, focuses on training with synthetic data rather than human-written answers to optimize model performance in areas such as code generation, mathematical reasoning, multilingual processing, and long text handling. Llama 3 generates synthetic data for code generation through three methods, employs research methods for mathematical reasoning, and collects high-quality human annotations for multilingual pre-training. Furthermore, Llama 3's training is enhanced using Brave Search, Wolfram Alpha, and Python interpreter.