Zhejiang University Alumni Collaborate with Microsoft to Launch Multimodal Model LLaVA, Challenging GPT-4V


ByteDance and universities launch Sa2VA, integrating LLaVA for video understanding and SAM-2 for precise object segmentation, enhancing video analysis through complementary capabilities.....
"Alibaba has established a robotics and embodied AI team", led by executive Lin Junyang, aimed at developing innovative robotics technology and promoting the advancement of embodied AI. Embodied AI refers to intelligent systems that can interact with the environment through physical bodies, marking the company's further expansion in the field of intelligence.
During a technical livestream at 1 AM today, OpenAI officially launched its latest and most powerful multimodal models: o4-mini and the full-power o3. These models offer unique advantages, capable of processing text, images, and audio simultaneously. They also function as agents, automatically utilizing tools such as web search, image generation, and code parsing. Furthermore, they possess a deep thinking mode, enabling reasoning about images within a chain of thought.
The latest version 2.6 of WallFacer’s MiniCPM-V series has rapidly climbed to the Top 3 on GitHub and HuggingFace trends, surpassing ten thousand stars. Since its release in February, it has accumulated over a million downloads, becoming a benchmark for on-device model capabilities. MiniCPM-V2.6 achieves performance enhancements for on-device multimodal models with 8 billion parameters, including real-time video understanding, multi-image joint understanding, and multi-image in-context learning, with a quantized backend memory of only 6GB and an inference speed of up to 18 tokens.
"MiniCPM-V2.6" is an edge-side multimodal artificial intelligence model that, with only 8B parameters, has achieved SOTA (State of the Art) results in single image, multiple images, and video understanding, all under 20B parameters, significantly enhancing edge AI's multimodal capabilities and being fully comparable to GPT-4V.