Article Content

Apple Releases Open Source Multimodal Machine Learning Model 'Ferret'

Published in Latest AI News

Time :Dec 25, 2023

Read :1minute

Translated data: Apple Inc. has partnered with Cornell University to release an open-source multimodal machine learning model named "Ferret." Ferret is a system capable of referencing and locating elements at any position within an image. It can identify useful elements in user queries and provide appropriate responses. This release demonstrates Apple's more open approach to AI work, affirming its commitment to impactful AI research.

Related Recommendations

DeepSeek Core Experts Join, Yuanrong Qixing Fully Shifts to Large Model Technology Roadmap

At the Beijing Auto Show, Ruan Chong, a former core researcher of DeepSeek's multimodal technology, appeared as the chief scientist of Yuanrong Qixing, marking the company's shift in autonomous driving technology. CEO Zhou Guang stated that multimodal large models achieved breakthroughs in early 2026, and the advantages of the autonomous driving route based on large models are significant, surpassing previous technologies.

Apr 27, 2026

161.1k

Xiaomi Launches the Most Powerful Model Series MiMo-V2.5, Official Public Testing Begins

Xiaomi released the MiMo-V2.5 series of large models on April 23 and initiated public testing. The series includes four models, with the core models MiMo-V2.5-Pro and MiMo-V2.5 being open-sourced globally, demonstrating its commitment to promoting an open AI ecosystem. This update is not only a product iteration but also a comprehensive upgrade of the technology foundation, featuring flagship performance that supports a context length of up to one million and complex task processing.

Apr 23, 2026

265.1k

Xiaohongshu Suddenly Open-Sources a Training Engine, RelaX AI Circle Gains Another Significant Player

Xiaohongshu open-sources the RelaX reinforcement learning training engine, designed specifically for multimodal and agent scenarios, supporting unified processing of text, images, audio, and video, accurately aligning with the development trends of the AI industry.

Apr 15, 2026

251.0k

ByteDance Volcano Engine Seedance 2.0 Officially Opens Application for General API Customers

ByteDance's Volcano Engine opened public API applications for the Seedance2.0 multimodal video generation model on April 2, transitioning from limited testing to broader availability. The model supports text, image, audio, and video inputs, enabling character consistency, director-level shot control, and physical simulation.....

Apr 2, 2026

274.5k

Ant Forest LingBot Open-Source 2.7T Depth Dataset 2 Million Real Samples Covering 6 Cameras

Ant Forest LingBot Technology opens a large-scale RGB-D depth dataset called LingBot-Depth-Dataset, containing 3 million high-quality samples, of which 2 million are collected from real scenes and 1 million are rendered. The total size reaches 2.71 TB, covering 6 mainstream depth cameras. It is currently the largest real-scene RGB-D dataset in the open-source community, providing richer data support for embodied intelligence, spatial perception, and 3D vision fields.

Mar 31, 2026

198.6k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご