Xi Xiaoyao Technology Talk | Stop Saying GPT-4V is Amazing! It Can't Even Recognize Peking Duck, Can You Believe It??


Tokyo startup InfiniMind secures $5.8 million in seed funding, founded by a former Google employee, dedicated to developing AI infrastructure that transforms massive unused video and audio dark data into searchable structured business intelligence to address enterprise data processing challenges.
AI startup Moondream has officially announced the completion of $4.5 million in seed funding and presents a disruptive viewpoint: in the world of AI models, smaller models may hold advantages. The company is backed by Felicis Ventures, Microsoft's M12 GitHub Fund, and Ascend, launching a visual language model with only 1.6 billion parameters that can compete with models four times its size in terms of performance.
Recently, H2O.ai announced the launch of two new visual language models aimed at enhancing the efficiency of document analysis and optical character recognition (OCR) tasks. The two models, H2OVL Mississippi-2B and H2OVL-Mississippi-0.8B, demonstrate remarkable competitiveness compared to models from large tech companies, potentially offering businesses dealing with heavy document workflows a more efficient solution.
Alibaba Group's Tongyi Qwen QwenLM project has encountered an unexpected takedown of its Github page, resulting in a 404 error message when users try to access it. Project leader Lin Junyang responded in a social media post, stating that the team has not disappeared and is in communication with officials to resolve the page takedown issue. Despite the team's efforts, the page remains inaccessible. It is worth noting that the team recently released the Qwen2-VL model, which has excelled in processing video content up to 20 minutes long, surpassing multiple authoritative evaluation metrics for multimodal models, with some metrics even exceeding those of GPT.
On September 2nd, Tongyi Qwen announced the open sourcing of its second-generation visual language model Qwen2-VL, and launched APIs for the 2B and 7B sizes as well as their quantized versions on the Aliyun Bailian platform for direct user access. The Qwen2-VL model achieves comprehensive performance improvements in several areas. It can understand images of different resolutions and aspect ratios, setting global leading performance on benchmarks such as DocVQA, RealWorldQA, and MTVQA.