University of California, Santa Cruz Develops Open Source Multimodal Model MiniGPT-5


ByteDance's Seed team launched Seedream4.0, a next-gen image generation model with enhanced multimodal capabilities, supporting text-to-image, image-to-image, and multi-image editing for flexible creative experiences.....
Step-Audio2mini, an open-source audio model by Step Fun, achieves SOTA in benchmarks. It excels in speech understanding, translation, and emotion analysis with unified audio inference and generation.....
The Nanometer AI Super Search Intelligent Entity under the 360 Company has undergone a major update, adding new features such as multimodal content generation, cross-domain professional search, and smarter task preview functions. From one-click generation of PPTs, PDF reports to automatically integrating videos, voiceover scripts, and storyboard planning, Nano AI redefines the boundary of AI search and creation with more efficient and intuitive experiences. AIbase comprehensively organizes the latest social media dynamics to help you deeply understand the latest breakthroughs of Nano AI. Multimodal Generation: Handle everything from PPTs to videos with one click.
Recently, Tencent's Hunyuan Video Model has officially begun recruiting testing partners on Platform X, marking a critical testing stage for this cutting-edge AI video generation technology. According to official sources, there is a high probability that the model will be open-sourced after the testing concludes, contributing its technological achievements to the global AI community. The Hunyuan Video Model is an important innovation by Tencent in the field of AI video generation, boasting over 13B parameters, making it one of the largest video generation models among open-source models.
AI startup Moondream has officially announced the completion of $4.5 million in seed funding and presents a disruptive viewpoint: in the world of AI models, smaller models may hold advantages. The company is backed by Felicis Ventures, Microsoft's M12 GitHub Fund, and Ascend, launching a visual language model with only 1.6 billion parameters that can compete with models four times its size in terms of performance.