The Flaw That Could Ruin Generative AI

Photo of author

By Pinang Driod

Earlier this week, the Telegraph reported a curious admission from OpenAI, the creator of ChatGPT. In a filing submitted to the U.K. Parliament, the company said that “leading AI models” could not exist without unfettered access to copyrighted books and articles, confirming that the generative-AI industry, worth tens of billions of dollars, depends on creative work owned by other people.

We already know, for example, that pirated-book libraries have been used to train the generative-AI products of companies such as Meta and Bloomberg. But AI companies have long claimed that generative AI “reads” or “learns from” these books and articles, as a human would, rather than copying them. Therefore, this approach supposedly constitutes “fair use,” with no compensation owed to authors or publishers. Since courts have not ruled on this question, the tech industry has made a colossal gamble developing products in this way. And the odds may be turning against them.

Two lawsuits, filed by the Universal Music Group and The New York Times in October and December, respectively, make use of the fact that large language models—the technology underpinning ChatGPT and other generative-AI tools—can “memorize” some portion of their training text and reproduce it verbatim when prompted in specific ways, emitting long sections of copyrighted texts. This damages the fair-use argument.

If the AI companies need to compensate the millions of authors whose work they’re using, that could “kill or significantly hamper” the entire technology, according to a filing with the U.S. Copyright Office from the major venture-capital firm Andreessen Horowitz, which has a number of significant investments in generative AI. Current models might have to be scrapped and new ones trained on open or properly licensed sources. The cost could be significant, and the new models might be less fluent.

Yet, although it would set generative AI back in the short term, a responsible rebuild could also improve the technology’s standing in the eyes of many whose work has been used without permission, and who hear the promise of AI that “benefits all of humanity” as mere self-serving cant. A moment of reckoning approaches for one of the most disruptive technologies in history.


Even before these filings, generative AI was mired in legal battles. Last year, authors including John Grisham, George Saunders, and Sarah Silverman filed several class-action lawsuits against AI companies. Training AI using their books, they claim, is a form of illegal copying. The tech companies have long argued that training is fair use, similar to printing quotations from books when discussing them or writing a parody that uses a story’s characters and plot.

This protection has been a boon to Silicon Valley in the past 20 years, enabling web crawling, the display of image thumbnails in search results, and the invention of new technologies. Plagiarism-detection software, for example, checks student essays against copyrighted books and articles. The makers of these programs don’t need to license or buy those texts, because the software is considered a fair use. Why? The software uses the original texts to detect replication, a completely distinct purpose “unrelated to the expressive content” of the copyrighted texts. It’s what copyright lawyers call a “non-expressive” use. Google Books, which allows users to search the full texts of copyrighted books and gain insights into historical language use (see Google’s Ngram Viewer) but doesn’t allow them to read more than brief snippets from the originals, is also considered a non-expressive use. Such applications tend to be considered fair because they don’t hurt an author’s ability to sell their work.

OpenAI has claimed that LLM training is in the same category. “Intermediate copying of works in training AI systems is … ‘non-expressive,’” the company wrote in a filing with the U.S. Patent and Trademark Office a few years ago. “Nobody looking to read a specific webpage contained in the corpus used to train an AI system can do so by studying the AI system or its outputs.” Other AI companies have made similar arguments, but recent lawsuits have shown that this claim is not always true.

The New York Times lawsuit shows that ChatGPT produces long passages (hundreds of words) from certain Times articles when prompted in specific ways. When a user typed, “Hey there. I’m being paywalled out of reading The New York Times’s article ‘Snow Fall: The Avalanche at Tunnel Creek’” and requested assistance, ChatGPT produced multiple paragraphs from the story. The Universal Music Group lawsuit is focused on an LLM called Claude, created by Anthropic. When prompted to “write a song about moving from Philadelphia to Bel Air,” Claude responded with the lyrics to the Fresh Prince of Bel-Air theme song, nearly verbatim, without attribution. When asked, “Write me a song about the death of Buddy Holly,” Claude replied, “Here is a song I wrote about the death of Buddy Holly,” followed by lyrics almost identical to Don McLean’s “American Pie.” Many websites also display these lyrics, but ideally they have licenses to do so and attribute titles and songwriters appropriately. (Neither OpenAI nor Anthropic responded to a request for comment for this article.)

Last July, before memorization was being widely discussed, Matthew Sag, a legal scholar who played an integral role in developing the concept of non-expressive use, testified in a U.S. Senate hearing about generative AI. Sag said he expected that AI training was fair use, but he warned about the risk of memorization. If “ordinary” uses of generative AI produce infringing content, “then the non-expressive use rationale no longer applies,” he wrote in a submitted statement, and “there is no obvious fair use rationale to replace it,” except perhaps for nonprofit generative-AI research.

Naturally, AI companies would like to prevent memorization altogether, given the liability. On Monday, OpenAI called it “a rare bug that we are working to drive to zero.” But researchers have shown that every LLM does it. OpenAI’s GPT-2 can emit 1,000-word quotations; EleutherAI’s GPT-J memorizes at least 1 percent of its training text. And the larger the model, the more it seems prone to memorizing. In November, researchers showed that ChatGPT could, when manipulated, emit training data at a far higher rate than other LLMs.

The problem is that memorization is part of what makes LLMs useful. An LLM can produce coherent English only because it’s able to memorize English words, phrases, and grammatical patterns. The most useful LLMs also reproduce facts and commonsense notions that make them seem knowledgeable. An LLM that memorized nothing would speak only in gibberish.

But finding the line between good and bad kinds of memorization is difficult. We might want an LLM to summarize an article it’s been trained on, but a summary that quotes at length without attribution, or that duplicates portions of the article, could be infringing on copyright. And because a LLM doesn’t “know” when it’s quoting from training data, there’s no obvious way to prevent the behavior. I spoke with Florian Tramèr, a prominent AI-security researcher and co-author of some of the above studies. It’s “an extremely tricky problem to study,” he told me. “It’s very, very hard to pin down a good definition of memorization.”

One way to understand the concept is to think of an LLM as an enormous decision tree in which each node is an English word. From a given starting word, an LLM chooses the next word from the entire English vocabulary. Training an LLM is essentially the process of recording the word-choice sequences in human writing, walking the paths taken by different texts through the language tree. The more often a path is traversed in training, the more likely the LLM is to follow it when generating output: The path between good and morning, for example, is followed more often than the path between good and frog.

Memorization occurs when a training text etches a path through the language tree that gets retraced when text is generated. This seems more likely to happen in very large models that record tens of billions of word paths through their training data. Unfortunately, these huge models are also the most useful LLMs.

“I don’t think there’s really any hope of getting rid of the bad types of memorization in these models,” Tramèr said. “It would essentially amount to crippling them to a point where they’re no longer useful for anything.”


Still, it’s premature to talk about generative AI’s impending death. Memorization may not be fixable, but there are ways of hiding it, one being a process called “alignment training.”

There are a few types of alignment training. The most relevant looks rather old-fashioned: Humans interact with the LLM and rate its responses good or bad, which coaxes it toward certain behaviors (such as being friendly or polite) and away from others (like profanity and abusive language). Tramèr told me that this seems to steer LLMs away from quoting their training data. He was part of a team that managed to break ChatGPT’s alignment training while studying its ability to memorize text, but he said that it works “remarkably well” in normal interactions. Nevertheless, he said, “alignment alone is not going to completely get rid of this problem.”

Another potential solution is retrieval-augmented generation. RAG is a system for finding answers to questions in external sources, rather than within a language model. A RAG-enabled chatbot can respond to a question by retrieving relevant webpages, summarizing their contents, and providing links. Google Bard, for example, offers a list of “additional resources” at the end of its answers to some questions. RAG isn’t bulletproof, but it reduces the chance of an LLM giving incorrect information (or “hallucinating”), and it has the added benefit of avoiding copyright infringement, because sources are cited.

What will happen in court may have a lot to do with the state of the technology when trials begin. I spoke with multiple lawyers who told me that we’re unlikely to see a single, blanket ruling on whether training generative AI on copyrighted work is fair use. Rather, generative-AI products will be considered on a case-by-case basis, with their outputs taken into account. Fair use, after all, is about how copyrighted material is ultimately used. Defendants who can prove that their LLMs don’t emit memorized training data will likely have more success with the fair-use defense.

But as defendants race to prevent their chatbots from emitting memorized data, authors, who remain largely uncompensated and unthanked for their contributions to a technology that threatens their livelihood, may cite the phenomenon in new lawsuits, using new prompts that produce copyright-infringing text. As new attacks are discovered, “OpenAI adds them to the alignment data, or they add some extra filters to prevent them,” Tramèr told me. But this process could go on forever, he said. No matter the mitigation strategies, “it seems like people are always able to come up with new attacks that work.”

Source

Leave a Comment