On March 18, Apple was again listed as a defendant in a copyright infringement lawsuit by Chicken Soup for the Soul, LLC, alleging that Apple used the "The Pile" dataset, which contains pirated books, for artificial intelligence training. This large-scale lawsuit also includes global technology giants such as Meta, xAI, Google, Anthropic, OpenAI, Perplexity, and NVIDIA. The core issue of the case is the "Books3" shadow library module in the dataset, which contains a large number of copyrighted literary works.

Regarding the allegations, Apple reiterated that since 2024, it has been committed to building AI datasets in a legal and ethical manner. Although Apple researchers had used the "The Pile" dataset in the open-source project
Currently, companies like Perplexity have defended their web scraping activities, while Apple has maintained the transparency and compliance of its model training. As AI regulation tightens, this class-action lawsuit targeting underlying training data not only marks an escalation in creators' resistance against tech giants' "data exploitation," but will also force the industry to re-examine the compliance costs and technical boundaries of "data traceability" in model training.
