Zhipu AI Open-Source Visual Language Model CogAgent Supports GUI Graphic Interface Q&A


Ant Forest LingBot Technology opens a large-scale RGB-D depth dataset called LingBot-Depth-Dataset, containing 3 million high-quality samples, of which 2 million are collected from real scenes and 1 million are rendered. The total size reaches 2.71 TB, covering 6 mainstream depth cameras. It is currently the largest real-scene RGB-D dataset in the open-source community, providing richer data support for embodied intelligence, spatial perception, and 3D vision fields.
Microsoft has recruited top research teams from the Allen Institute for Artificial Intelligence and the University of Washington, led by Ali Farhadi, former CEO of Ai2, who joined Microsoft's newly established 'Super Intelligence' department, aiming to strengthen its general artificial intelligence strategy.
Tokyo startup InfiniMind secures $5.8 million in seed funding, founded by a former Google employee, dedicated to developing AI infrastructure that transforms massive unused video and audio dark data into searchable structured business intelligence to address enterprise data processing challenges.
The open-source AI platform Hugging Face has refused a $5 billion investment from NVIDIA, drawing industry attention. As a globally active AI model library, this move is not due to financial sufficiency, as it has previously received investments from giants like NVIDIA.
The Stepwise Star open-source multimodal vision-language model Step3-VL-10B excels in multiple benchmark tests with only 10B parameters, solving the problem of insufficient intelligence in small models. The model achieves the best performance in its scale in visual perception, logical reasoning, and math competitions, even surpassing open-source and closed-source flagship models that are 10 to 20 times larger in size.