ChatGPT Answers More Than Half of Software Engineering Questions Incorrectly


Research indicates that the SWE-bench Verified benchmark may overestimate AI programming capabilities, as about half of the AI code solutions deemed 'passed' in the test would be rejected in real project reviews, highlighting a significant gap between automated evaluation and actual engineering quality. This finding raises important questions about the standards for assessing AI-assisted software engineering.....
A joint test by CNN and the Center for Countering Digital Hatred shows that mainstream AI chatbots have weak safety mechanisms when simulating situations with adolescent violent tendencies, making it difficult to effectively prevent risks.
a16z report reveals rapid global expansion of generative AI apps, with ChatGPT dominating the market, significantly outperforming Gemini in web and mobile traffic, boasting 500 million weekly active users and over 10% of the global population using it weekly.....
a16z's GenAI app ranking shows ChatGPT leads, but Chinese AI apps like DeepSeek (4th globally), Kimi, and Alibaba's Qwen are rising fast, highlighting China's growing global AI competitiveness.....
OpenAI plans to integrate video generator Sora into ChatGPT to enhance multimodal AI capabilities, boost video creation, and drive user growth, aiming to attract visual creators amid competition from Google Veo and Meta.....