Yi-VL Multimodal Language Model Released with Two Versions


Recently, Meta AI open-sourced a foundational multimodal language model named SPIRIT LM, which can freely mix text and speech, opening new possibilities for multimodal tasks involving audio and text. SPIRIT LM is based on a pre-trained text language model with 7 billion parameters, which has been continuously trained on text and speech units, expanding into the speech modality. It can understand and generate text like a large text model, while also being capable of understanding and generating speech, and even mixing text and speech to create various forms of expression.
Zero One Everything has announced the open sourcing of the Yi-9B model. Yi-9B is the strongest model in the Yi series in terms of coding and mathematical capabilities. It excels in overall, coding, and mathematical abilities, and can be easily deployed on consumer-grade graphics cards. The company was founded by Kaifu Lee, Chairman and CEO of Innovation Works.
NExT-GPT is an open source multimodal language model developed by the National University of Singapore, capable of processing text, images, videos, and audio, providing robust support for multimedia AI applications. It features a three-layer architecture, including linear projection, Vicuna LLM core, and modality-specific transformation layers, with intermediate layer training conducted using MosIT technology. The open-source contribution enables researchers and developers to create applications that integrate multimodal inputs, with potential applications spanning a wide range of fields. What sets NExT-GPT apart is its ability to generate modalities based on user requests.
OpenAI is facing a 9 billion US dollars negative free cash flow crisis, highlighting the contradiction between technological leadership and financial sustainability. The main reasons for the huge cash outflow include infrastructure expansion, high operating costs, and lagging revenue growth, reflecting the common dilemma of aggressive investment and imbalance between profitability in the AI industry.
OpenAI is reportedly testing GPT-5.1, with the anonymous model 'Polaris Alpha' on OpenRouter featuring 256K context, 128K output, and knowledge updated to October 2024.....