Major Publishers Sue Meta and Zuckerberg, Alleging Copyright Infringement in AI Training

Major publishers are suing Meta, claiming the company illegally used pirated books and journals to train its Llama AI models.

The case centers on whether AI training is protected 'fair use' or if the use of pirated data to create competing content constitutes infringement.

The market reacted calmly, viewing the lawsuit as a manageable financial risk for Meta rather than an existential threat.

META

Meta is facing a major legal battle over how it built its powerful AI models.

Five large publishers and author Scott Turow have filed a lawsuit against Meta and its CEO, Mark Zuckerberg, alleging 'massive' copyright infringement. They claim that the company used pirated books and journals, sourced from sites like LibGen, to train its Llama series of AI. While Meta plans to fight the case aggressively by arguing that AI training qualifies as 'fair use', the plaintiffs are employing a more nuanced strategy.

This case highlights a critical split in U.S. case law. On one hand, the 'Kadrey v. Meta' case from 2025 found that training an AI on books was fair use. On the other hand, the 'Thomson Reuters v. ROSS' decision, also from 2025, ruled against fair use when a company copied proprietary data to build a competing product, emphasizing the 'method of acquisition' and 'market substitution'.

The publishers are leaning heavily on the ROSS precedent. First, they point to unsealed emails from 2025 suggesting Meta employees knowingly discussed torrenting pirated libraries, which directly addresses the 'method of acquisition'. Second, they argue that AI-generated content can substitute for human-authored books, harming the market for their original works. This legal approach is designed to sidestep the arguments that won the Kadrey case for Meta.

Furthermore, the context has changed. Over the past couple of years, a clear licensing market has emerged, with companies like OpenAI signing deals with the Financial Times and News Corp. This makes it harder to argue that using copyrighted content for training should be free. The market's initial reaction was mild, with Meta's stock dipping only slightly. Investors seem to believe that even if Meta loses, the financial penalty—likely in the low single-digit billions, based on a similar settlement involving Anthropic—would be a significant but manageable cost for a company of its size.

Ultimately, the outcome will depend on which legal argument the court finds more persuasive: the general principle of training as fair use, or the specific allegations of using pirated data to create market substitutes.

Fair Use: A legal doctrine that permits the unlicensed use of copyright-protected works in certain circumstances, such as for criticism, comment, news reporting, teaching, scholarship, or research.
Market Substitution: The degree to which a new product or service replaces the demand for an existing one. In this context, it refers to whether AI-generated content reduces the market for original, human-authored books.
LibGen (Library Genesis): A shadow library website that provides free access to millions of scholarly articles, journals, and books, often in violation of copyright.

Major Publishers Sue Meta and Zuckerberg, Alleging Copyright Infringement in AI Training

Major publishers are suing Meta, claiming the company illegally used pirated books and journals to train its Llama AI models.

The case centers on whether AI training is protected 'fair use' or if the use of pirated data to create competing content constitutes infringement.

The market reacted calmly, viewing the lawsuit as a manageable financial risk for Meta rather than an existential threat.