Yi-VL Multimodal Language Model Released with Two Versions


Recently, Meta AI open-sourced a foundational multimodal language model named SPIRIT LM, which can freely mix text and speech, opening new possibilities for multimodal tasks involving audio and text. SPIRIT LM is based on a pre-trained text language model with 7 billion parameters, which has been continuously trained on text and speech units, expanding into the speech modality. It can understand and generate text like a large text model, while also being capable of understanding and generating speech, and even mixing text and speech to create various forms of expression.
Zero One Everything has announced the open sourcing of the Yi-9B model. Yi-9B is the strongest model in the Yi series in terms of coding and mathematical capabilities. It excels in overall, coding, and mathematical abilities, and can be easily deployed on consumer-grade graphics cards. The company was founded by Kaifu Lee, Chairman and CEO of Innovation Works.
NExT-GPT is an open source multimodal language model developed by the National University of Singapore, capable of processing text, images, videos, and audio, providing robust support for multimedia AI applications. It features a three-layer architecture, including linear projection, Vicuna LLM core, and modality-specific transformation layers, with intermediate layer training conducted using MosIT technology. The open-source contribution enables researchers and developers to create applications that integrate multimodal inputs, with potential applications spanning a wide range of fields. What sets NExT-GPT apart is its ability to generate modalities based on user requests.
AI video generation is evolving from random 'blind box' stages to practical use. Despite Sora's initial industry anxiety, issues like incoherent visuals hinder industrial application. Wanjing Studio addresses this by refining workflows to transform AI video from a demo 'toy' into a reliable 'productivity tool', focusing on coherence and controllability.....
Shanghai adds 11 generative AI services to its filing list, totaling 149, leading nationwide. This move implements regulatory measures, promoting AI innovation and standardized development, with research institutions' models performing notably.....