A milestone breakthrough in the field of voice interaction! Recently, the domestic AI company Step Audio has shockingly open-sourced a



A milestone breakthrough in the field of voice interaction! Recently, the domestic AI company Step Audio has shockingly open-sourced a


OpenAI launched two API updates to enhance the performance of AI agents in voice interaction and complex tasks. The new real-time model gpt-realtime-1.5 and its accompanying audio model significantly improve the reliability of voice commands. Internal testing shows that the new model has improved digit and letter transcription accuracy by about 10%, logic audio task accuracy by 5%, and instruction execution accuracy by 7%.
Google officially launched the new Gemini application on the Apple App Store, introducing the voice interaction feature Gemini Live, marking a significant breakthrough in the smart voice assistant field. Meanwhile, Apple's plan to integrate OpenAI's ChatGPT into Siri also indicates an intensifying competition in this area. As an upgraded version of Bard released by Google in 2023, Gemini is
Recently, CNKI launched the mobile version of its AI Academic Research Assistant, aimed at providing researchers with more convenient academic support. This AI assistant, after receiving widespread acclaim upon its launch on the PC platform, is now available through the CNKI mobile app, catering to users needs for on-the-go access.The main features provided by the AI Academic Research Assistant include:Enhanced Question-Answering Retrieval: Users can ask questions in natural language, and the AI
📱 Smart Speaker Dilemma: The article analyzes the reasons why smart speakers are labeled as "toys", such as poor interaction experience and limited usage scenarios. 💡 Application of Large Models: The article mentions that large models bring new development opportunities for smart speakers and can significantly enhance their interaction experience. 🤝 Industry Competition: The article discusses the competitive landscape in the smart speaker market involving Baidu, Alibaba, Xiaomi, and others.
The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.