Researchers Successfully Induce AI Chatbot to Reveal Harmful Content


Gu Quanquan, a core researcher at ByteDance's Seed team, has confirmed her departure. She shared her research achievements in AI drug discovery and pre-training of large language models over the past three years on a social platform. The bio-molecular structure prediction model SeedFold, which she led the development of, performed excellently in multiple public benchmark tests. This departure comes as ByteDance's AI business accelerates its commercialization, drawing attention to the emerging trend of AI for Science startups.
The reasoning capabilities of large language models in the field of cybersecurity are facing a serious test. Security researcher Kasra Rahjerdi conducted simulated hacker attack tests on mainstream large models by building an APK with core vulnerabilities in book review data, revealing their true level of security reasoning and vulnerability exploitation. The test lasted 2 hours with a single budget of $10, intuitively demonstrating the performance of each model in complex logical challenges.
As Generative AI sweeps through the programming field, the Zig open-source project has introduced a strict policy in the opposite direction: completely prohibiting the use of code or comments generated by large language models for contributions. After Simon Willison's interpretation, it sparked a discussion within the community about the trade-off between technical efficiency and talent development. The core conflict lies in the choice between code production and talent growth. The Zig maintainers redefined 'contributions,' emphasizing originality and the learning process.
The efficiency of large language model inference has made a breakthrough. Tsinghua University and Moonshot AI jointly proposed a new architecture called "Prefill-as-a-Service," which splits the inference process into two stages: prefilling and decoding, and optimizes the allocation of computing resources, effectively solving hardware limitations and significantly improving model service performance.
Research shows that current mainstream AI models still have significant shortcomings in simulating clinical diagnostic reasoning and are not yet capable of independently handling medical tasks. This study tested 21 large language models, and the results were published in "JAMA Network Open".