The AI Copyright Conundrum: Memorising or Learning?

“Copyright protects reproductions, so from a legal perspective, the way you assess memorisation is by looking at the output.” – Vincent Bergeron

By Julius Melnitzer | March 4, 2026

The immediate and growing controversy over whether artificial intelligence (AI) “learns” or “memorises” is raising a foundational challenge to copyright law.

What’s ignited the controversy are recent academic studies demonstrating that AI can reproduce copyrighted work with high accuracy. Most famously, The Atlantic reported that researchers from Yale and Stanford universities, using strategic prompts, had a major AI reproduce “near-complete text” from several copyrighted works, including George Orwell’s 1984 and J.K. Rowling’s Harry Potter and the Sorcerer’s Stone.

And whether AIs memorise or learn is not merely a technical issue: its resolution will determine the extent to which copyright is still an effective mechanism for controlling and monetising works in an AI-driven environment.

What’s memorizing? What’s learning?

The key to the determination is in the output.

“Copyright protects reproductions, so from a legal perspective, the way you assess memorisation is by looking at the output,” says Vincent Bergeron, a principal in the Montreal office of ROBIC, a member of the IPH network. “And when you’re talking about memorisation in AI, one of the key questions is whether it can reproduce part of its training verbatim or nearly so.”

AI developers maintain that large language models (LLMs) train by “learning” statistical patterns but do not store copies of training data in the model’s parameters. The implication is that AI learning involves abstraction, which is similar to how humans learn.

“If AI models learn, then training on copyrighted works is more likely to engage arguments based on fair use or fair dealing exceptions,” Bergeron says.

Rights holders counter that the output is coming from inside the model itself, meaning that training involves copying expressive content (which amounts to reproduction).

“What the available research and public information tells us is that most developers encode models using statistical relationships, but some models exhibit memorisation effects that allow the output to reflect the copyrighted materials a bit too closely,” Bergeron says.

In this context, it’s important to distinguish “memorisation” from “retrieval”. Unlike memorisation, retrieval does not encode specific copyrighted sequences internally. Rather, retrieval occurs when an AI (if it’s connected to the Internet) “fetches” or “looks up” information from an external source while actively answering a query—not during training and without storing the content internally after delivering the answer.

“Much like a search engine, an AI that only retrieves from the internet is not reproducing something from its training materials,” Bergeron says. “It’s merely accessing external content that already exists and showing it to you.”

Prove it

But determining whether AI learns or memorises is challenging.

“It’s very complicated to demonstrate how an AI is trained, because what’s happening in that big black box is not clear even to developers,” Bergeron says. “It’s not like the models are retaining a database of the information on which they’ve been trained.”

Still, there are techniques lawyers can employ to build a persuasive case that a particular AI is memorising. They include:

Demonstrating that the output or part of the output is identical or near identical to the original;
Showing that the output reveals information to which the AI should not have access;
Using experts to show that the model behaves differently (more accurately) when copyrighted data is input, leading to the inference that the copyrighted data was used in training;
Compelling disclosure of training data and internal emails to prove unlawful copying;
Having experts do reproducibility tests that check whether AI reproduces the copyrighted text even when prompts are different;
Having experts testify that the AI’s output can only be reproduced if it accessed the copyrighted material; and
Proving that the AI’s behaviour matches known memorisation patterns in LLMs.

Indeed, similar techniques have proven effective in IP cases in the past.

The law: what’s happening?

While most major jurisdictions have enacted some form of regulatory framework for AI, jurisprudential direction about output-based memorisation has not yet emerged. Even the widely-publicized Anthropic class action settlement in the US involved other infringement theories related to the downloading of pirated books, and had nothing to do with memorisation.

But the US is currently the forum for at least nine active and pending cases involving memorisation issues. So far, it’s the only jurisdiction with such litigation: there are other widely-publicized AI infringement cases in Canada and the UK, but they involve input-side copying, not memorisation issues.

Most of the US cases are at preliminary stages of litigation. So, it may be a while before the law has some answers.

Julius Melnitzer is a Toronto-based writer who focuses on law, legal affairs, and the business of law. Follow him on LegalWriter.net or email him at julius@legalwriter.net.