On March 18, Apple was again listed as a defendant in a copyright infringement lawsuit by Chicken Soup for the Soul, LLC, alleging that Apple used the "The Pile" dataset, which contains pirated books, for artificial intelligence training. This large-scale lawsuit also includes global technology giants such as Meta, xAI, Google, Anthropic, OpenAI, Perplexity, and NVIDIA. The core issue of the case is the "Books3" shadow library module in the dataset, which contains a large number of copyrighted literary works.

Apple, Apple, event, iPhone

Regarding the allegations, Apple reiterated that since 2024, it has been committed to building AI datasets in a legal and ethical manner. Although Apple researchers had used the "The Pile" dataset in the open-source project OpenELMs, the company emphasized that the project was for public research only and not used to power the core Apple Intelligence system. However, legal analysts believe that since Apple's base model was assisted by Google Gemini, if Google is found to have violated regulations in this case, Apple might face complex joint liability due to its technical supply chain connections.

Currently, companies like Perplexity have defended their web scraping activities, while Apple has maintained the transparency and compliance of its model training. As AI regulation tightens, this class-action lawsuit targeting underlying training data not only marks an escalation in creators' resistance against tech giants' "data exploitation," but will also force the industry to re-examine the compliance costs and technical boundaries of "data traceability" in model training.