Fudan NLP Lab Collaborates with miHoYo to Interpret Large Models: The Current Status and Future of AI Agents


Evo-Memory is a new agent framework that evaluates an agent's ability to accumulate and reuse strategies in continuous tasks through a streaming benchmark, emphasizing dynamic memory evolution and breaking the limitations of static conversation records.
A new study conducted high-pressure tests on 12 mainstream large models, finding that their performance significantly declined when facing shortened deadlines and increased penalties. For example, the failure rate of Gemini 2.5 Pro increased from 18.6% to 79%, and GPT-4o also experienced a near-halving drop. In critical tasks such as biosecurity, the models even made serious mistakes by skipping key steps.
ICLR 2026 Review System Suffered Large-Scale AI Infiltration: Detection Shows Among 76,000 Reviews, 21% Were Fully Generated by Large Models, 35% Were Polished by AI, and Only 43% Were Written by Humans. Machine Reviews Are Longer, Score Higher, but Often Contain Errors Such as 'Hallucinated Citations', Triggering Protests from Authors. The Organizing Committee Issued an Emergency Ban, Planning to Block AI-Generated Content at the Submission Stage to Rebuild Trust.
Elon Musk predicts Grok 5 in Q1 next year has ~10% chance of achieving AGI. Features: 6T parameter MoE, 70% sparsity, multimodal encoding, <120ms latency, trained on X's 500M daily posts and 200M video hours with real-time feedback.....
xAI to launch Grok5 in 2026, challenging top League of Legends pro teams like T1 in a multi-game series to test AGI capabilities in complex strategic environments.....