According to The Information, AI giant OpenAI is actively developing a new generative music tool that can create music based on text descriptions or audio prompts provided by users. This move marks OpenAI's further advancement into the field of multimodal content generation, following the success of its text and video models, such as ChatGPT and Sora.

Focus on Features: Video Soundtracks and Accurate Accompaniment
According to insiders, the potential applications of this new tool are extensive and practical, including adding customized background music to existing videos, and generating guitar or other instrument accompaniments based on existing vocal tracks. However, OpenAI has not yet clarified the release plan for the tool—whether it will be launched as an independent product or integrated into its existing core products, such as ChatGPT and the video generation application Sora.
Training Data Revealed: Collaboration with Juilliard School
To ensure the quality and professionalism of the new model's training data, an insider said that OpenAI is collaborating with some students from the renowned Juilliard School to annotate musical scores in detail, which will serve as a high-quality source of training data.
Although OpenAI previously released a music generation model before the launch of ChatGPT, the company has recently focused on the development of audio models for text-to-speech and speech-to-text. The recent news of OpenAI entering the music generation field indicates that it will now compete with pioneers in this area. Other major companies with music generation models include tech giants like Google and startups like Suno.
