[Vidhi Kawrani and Anisha Joshi are fourth-year B.A. LL.B. (Hons.) students at the Institute of Law, Nirma University, Ahmedabad. In this piece, the authors examine the growing reliance of courts and regulators on destruction and deletion-based remedies to address unlawful data use in generative AI systems. The piece argues that such remedies are structurally incompatible with machine learning models, which do not merely store information in a removable form but internalise patterns and informational influence through distributed parameter weights, making complete erasure technologically uncertain and legally difficult to verify. By analysing recent AI copyright disputes and data protection developments in the U.S. and the EU the piece contends that existing legal frameworks risk producing formally compliant yet substantively ineffective remedies, necessitating a shift toward regulatory models centred on traceability, auditability, lawful data sourcing, and developer accountability.]
When a court orders data destruction and a $1.5 billion settlement demand deletion of pirated datasets, a more fundamental question emerges – what does it mean to erase something from the mind of a machine?
In August 2025, the case of Bartz v. Anthropic PBC (“Bartz”) settled for what is, by and measure, a staggering sum of at least $1.5 billion, making it one of the largest copyright class action settlements in American legal history. Beyond the money, the settlement mandated a demand for data erasure that is subtly groundbreaking in AI jurisprudence. Within thirty days of a definitive ruling, Anthropic promised to remove any pirated datasets downloaded from the shadow libraries i.e. Library Genesis (“LibGen”) and the Pirate Library Mirror (“PiLiMi”), placing data erasure at the centre of contemporary AI litigation.
More significantly, the dispute exposes a deeper structural problem in contemporary AI regulation: destruction and deletion-based remedies do not translate effectively onto machine learning systems. This blog argues that existing legal frameworks continue to rely on assumptions derived from conventional data systems, where information can be identified, isolated, and permanently removed. However, large language models function differently. Once trained on a dataset, they do not retain information in a discrete retrievable form but internalise statistical patterns and informational influence through distributed parameter weights. As a result, eliminating source datasets does not necessarily eliminate their influence from the model itself, making complete erasure technologically uncertain and difficult to verify. The growing reliance on erasure-based remedies in AI litigation therefore reveals an emerging mismatch between legal concepts of deletion and the technical architecture of machine learning systems.
The Right to Be Forgotten Meets the Machine
Data protection law provides a beneficial doctrinal lens through which this tension between legal erasure and machine learning persistence can be further examined. This tension is reflected in European data protection law, particularly the ‘right to be forgotten’ under Article 17 GDPR, which allows individuals to request erasure of personal data under certain circumstances.
The judgement and rule are centred around the fundamental premise that data exists in unique, identifiable forms which may be detected and eliminated. In conventional databases, deletion is operationally straightforward, a record or link can be isolated and erased without disrupting the broader system.
However, such discrete methods are not used by large language models (“LLMs”). They use distributed parameter weights to represent statistical correlations across datasets instead of keeping retrievable entries. The original data point no longer exists in an independently identifiable form after it is integrated into those parameters; only its influence on model behaviour survives.
This poses substantial obstacles to traditional legal remedies. According to Higa, Bedikian, and Costa in Tech Policy Press (May 2025), “retraining the model from scratch is the only way to completely remove an individual’s data, an impractical and potentially costly solution.” Retraining sophisticated models involves significant time, energy, and computer resources, which raises major questions about proportionality and viability.
Therefore, a legal order requiring the removal of data from a trained model creates two difficult possibilities. First, meaningful compliance may require complete retraining of the model, effectively making deletion equivalent to destroying and reconstructing the system itself. Second, compliance may remain merely symbolic, achieved through the deletion of source datasets without reducing the influence already embedded within the model’s learned parameters. This reveals a deeper conflict in contemporary AI governance. While courts continue to rely on traditional notions of deletion derived from database logic, modern AI systems operate through distributed and non-extractive representations that resist conventional methods of erasure and verification.
Bartz v. Anthropic- The Anatomy of the Destruction Order
Against the backdrop of this structural tension in AI governance, Bartz, 3:24-cv-05417 (N.D. Cal.), presents one of the first extensive legal interactions with the limitations of dataset-based remedies in AI systems.
In August 2024, Kirk Wallace Johnson, Charles Graeber, and Andrea Bartz filed a lawsuit alleging that Anthropic had downloaded and stored more than seven million pirated books from shadow libraries such as LibGen and PiLiMi. In a split decision delivered in June 2025, Judge William Alsup held that the use of lawfully acquired books for training large language models could qualify as “quintessentially transformative” fair use, while simultaneously ruling that the acquisition and retention of pirated copies constituted copyright infringement. [paras. 32, 71–72]
By targeting stored datasets rather than embedded model parameters, the remedy avoids confronting whether legal erasure can meaningfully address information that has already been internalised through machine learning processes.
The case therefore, underscores a central tension: while destruction remedies remain legally intelligible, they may not fully remediate harms once infringing material has already informed model training.
III. The EDPB’s Nuclear Option and the European Dimension
While American courts have approached destruction remedies primarily through copyright law, particularly in Bartz and the ongoing copyright suits against OpenAI concerning the use of protected training data. By contrast, European regulators have developed parallel mechanisms through data protection jurisprudence, most notably in the Italian Garante’s enforcement action against OpenAI and the GDPR proceedings against Clearview AI concerning unlawful biometric data processing. In its December 2024 Opinion [para 114] on AI models, the European Data Protection Board (“EDPB”), observed that supervisory authorities may direct deletion of unlawfully processed datasets and, where partial deletion proves infeasible, potentially mandate “the erasure of the whole dataset used to develop the AI model and/or the AI model itself.”
Despite this increasingly assertive approach, reflected in regulators’ willingness to contemplate deletion of datasets and, in some formulations, even entire AI models, the European framework remains grounded in GDPR erasure principles designed for conventional data processing. Article 17 GDPR was not built for machine learning systems, and its application in this context raises recurring issues of feasibility and proportionality. The regulation itself reflects this tension by recognising exceptions for “disproportionate effort,” while Article 25’s “data protection by design” standard remains deliberately open-ended, tied to the evolving notion of the “state of the art.”
As a result, the European approach expands the scope of erasure without resolving the underlying challenge of implementation. Even where regulators assert the power to mandate deletion, the extent of erasure remains ambiguous, ranging from removal of specific training datasets or outputs to, in more expansive interpretations, retraining or effectively dismantling entire models. Yet in practice, the technical capacity to execute and independently verify such model-level erasure remains uncertain
Can You Actually “Untrain” a Model? The Problem of Machine Unlearning
The concept of machine unlearning, which allows AI systems to effectively “forget” certain types of information without needing to completely retrain, is rising in popularity as a ‘data erasure’ technique for addressing instances of potential violations of data, such as AI decision-making. The first framework proposing such a method was presented by Cao and Yang at the IEEE Symposium on Security and Privacy in 2015. Since then, research focusing on the concept of machine unlearning has expanded tremendously in the academic literature; however, current methodologies for achieving machine unlearning remain technically experimental, making it difficult to assess their effectiveness in the context of the increasingly complex foundation models being used. Moreover, machine unlearning presents a verifiable challenge for governance and regulation of AI technologies. Although courts and regulators can mandate that data should be erased, there are no established and accepted standards currently available to determine whether sufficient unlearning has actually occurred. In 2025, the Leiden Law Blog acknowledged current methodologies to accomplish unlearning are “too new for there to be any real confidence.”
Regulators are increasingly implementing additional measures to mitigate risks through the use of alternative methods of risk mitigation. For instance, output-level filtering systems prevent models from reproducing non-compliant or confidential data; however, these measures are effective only at controlling the expressions of information rather than the information itself. Therefore, while the model’s external expressions may be limited, the influences of the information contained within the model may not be limited.
Rethinking Remedies: From Erasure to Accountability
The limitations of machine unlearning ultimately point toward a broader regulatory conclusion: frameworks centred exclusively on deletion are increasingly ill-suited to the architecture of contemporary AI systems.
The outcome in Bartz illustrates a pragmatic compromise. By targeting stored datasets rather than trained models, the remedy addresses concrete acts of infringement while avoiding the technical and economic costs of full retraining. However, this approach also leaves unresolved a more specific problem this blog engages with: how legal systems should respond when infringing or unlawful material, once used in training, becomes diffused within model parameters and no longer exists as a retrievable or verifiable data point.
Emerging regulatory approaches suggest a gradual shift away from strict erasure towards broader accountability mechanisms. Legislative initiatives in both the United States and the European Union increasingly emphasise lawful data sourcing, transparency, and auditability as core principles of AI governance. These measures operate ex ante, seeking to prevent harm at the stage of data collection and model design rather than relying solely on ex post remedies.
This shift is both necessary and inevitable. Once data has been incorporated into a model’s architecture, complete erasure becomes, in practical terms, unattainable. Legal frameworks must therefore adapt by prioritising obligations that can be meaningfully implemented and verified.
In this context, the future of AI regulation lies not in mandating perfect forgetting, but in building robust systems of developer accountability. In current industry practice, however, deletion of user data is largely controlled by AI providers through internal retention policies and compliance workflows, rather than any direct or automated mechanism that ensures selective forgetting within the model itself. Although emerging techniques such as machine unlearning and structured data deletion pipelines aim to approximate targeted removal, they remain experimental, limited in scope, and difficult to verify at scale. The central challenge for AI governance is therefore whether existing legal and technical infrastructures can deliver meaningful, auditable deletion in systems that are fundamentally designed without true retrievability or verifiable forgetting.