Did Meta AI Break the Law? Over 42% of Harry Potter Content Swallowed by Llama Faces Major Lawsuit

A recent paper published by researchers from Stanford University, Cornell University, and West Virginia University reveals that Meta's Llama3.1 AI model can reproduce large amounts of copyrighted book content verbatim, posing a potential legal risk for the tech giant. The study found that the Llama3.170B model could reproduce up to 42% of the text from "Harry Potter and the Philosopher's Stone" during testing, far surpassing the 4.4% achieved by the first-generation Llama model.

AI models such as OpenAI's ChatGPT and Meta's Llama are typically trained on massive datasets to identify and generate new patterns. However, the key finding of this research is that Meta's Llama model appears not just to learn language patterns but can almost "fully remember" certain books, such as "Harry Potter" and "1984." Mark Lemley, a technology law expert at Stanford, stated that if an AI can generate complete excerpts from its training data, it no longer qualifies as a "transformative work" based on learning but instead resembles a "giant .ZIP file" containing copyrighted works, allowing users to copy freely.

Copyright Controversy: Verbatim Reproduction vs. Learning Patterns

In testing AI models from companies like OpenAI, DeepSeek, and Microsoft, Lemley's research team discovered that Meta's Llama was the only model capable of accurately recounting book content. Besides the first book in the "Harry Potter" series, the model also demonstrated significant memory capabilities for F. Scott Fitzgerald's "The Great Gatsby" and George Orwell's "1984."

The use of copyrighted materials to train Meta's AI has been highly controversial. The company is currently facing multiple copyright lawsuits, including one filed by notable authors (such as comedian Sarah Silverman), accusing Meta's models of being trained using the illegally obtained "Books3" dataset, which contains nearly 200,000 copyrighted publications. Court documents show that a Meta engineer once commented, "It felt wrong downloading torrents with a company laptop."

Lemley estimates that if only 3% of the content in the "Books3" dataset is deemed infringing, Meta could face statutory damages of nearly $1 billion, not including profit sharing. If the infringement ratio is higher, Meta's potential legal liabilities would be even more severe.

Legal Experts Shift Stance, Meta Refuses Comment

Notably, Lemley himself represented Meta in previous generative AI copyright litigation (Kadrey v Meta Platforms). However, as he led this research on AI model memory and reproduction of copyrighted content, he announced earlier this year that he would no longer represent Meta to protest certain behaviors of the company and its CEO, Mark Zuckerberg. Although he previously believed Meta should win, the new research findings seem to have changed his view.

Meta declined to comment on Lemley's latest research findings.

Did Meta AI Break the Law? Over 42% of Harry Potter Content Swallowed by Llama Faces Major Lawsuit

Related Recommendations

Meta Monitors Employees' AI Practice for Two Months, Then Faces a Crisis: 45,000 Privacy Data Tables Leaked, 1,600 Employees Jointly Demand to Stop

AI Training Boundaries Spark Debate: Meta Halts Internal Employee Monitoring Plan Due to Data Security Incident

Former Meta Executive: In the AI Era, There's No Need for a Ten-Year Plan - Just Focus on These Two Things

Meta Launches AI Mode Search on Facebook, Integrates Public Data Across Multiple Platforms to Create an Intelligent Q&A System

Meta Officially Starts the Separation of Manus: Establishing a Data Firewall, Founder Plans to Raise $1 Billion for a Buyout