Matt O’Brien
In a test case for the artificial intelligence industry, a federal judge ruled that humanity, an AI company, did not break the law by training chatbot clauds on millions of copyrighted books.
However, the company is still on the hook and must go to court for how it obtained these books by downloading them from the online “Shadow Library” of pirated copies.
San Francisco US District Judge William Alsp said in a ruling filed late Monday that it is a “typical transformation” as AI systems distill from thousands of written works, allowing them to create their own texts that have been certified as “fair use” under US copyright law.
“Like readers who are aiming to become writers, humanity (the AI-based language model) trained their works to not race, recreate, or replace them first.
However, dismissing important claims by a group of authors who sued the company for copyright infringement last year, Allsup said that humanity still has to go to trial in December for alleged theft of their works.
“Humanity had no right to use pirated copies in the Central Library,” writes Alsup.
The trio of Andrea Burtz, Charles Graeber and Kirk Wallace Johnson argued last summer that human practice amounted to “a massive theft,” and that the San Francisco-based company “seeks to benefit from the human representation and the human expression and talent behind those works.”
Books are known to be an important source of data (essentially, billions of words carefully staked) for building large-scale language models. In the competition that competes with each other when developing the most advanced AI chatbots, many tech companies are turning to online repositories of stolen books that can be obtained for free.
Documents disclosed in federal courts in San Francisco show internal concerns among human employees regarding the legality of the use of pirate sites. The company later shifted its approach and hired former Google executive Tom Turvey, who will be in charge of Google Books. This is a searchable library of digitalised books that have been successful in the long-standing copyright battle.
According to court documents, humanity, with his help, bought a large number of books, tore the bindings, and began scanning each page before feeding each page to an AI model. However, the judge said it did not reverse previous copyright infringement.
“The fact that humanity later purchased a copy of a book that he had stolen from the Internet will not be liable for theft, but it could affect the extent of statutory damage,” writes Alsup.
The ruling could set precedents for similar litigation piled up against Openai, the maker of CHATGPT, and Meta Platforms, the parent company of Facebook and Instagram.
Humanity – Founded in 2021 by a former Openi leader – is sold as a developer of a more responsible and secure AI model of the generation AI model that allows you to create emails, summarise documents, and interact with people in a natural way.
However, a lawsuit filed last year claimed that humanity’s actions “had laughed at the noble goals” by building AI products into pirated works.
Humanity said Tuesday that the judge was pleased to recognize that AI training is consistent with the “copyright purpose that enables creativity and promotes scientific advancement.” The statement did not address any claims of copyright infringement.
The author’s lawyer declined to comment.
Original issue: June 24th, 2025, 6:24pm EDT