Blocking Internet Archive Won't Stop AI, but Will Erase Web's Historical Record

Recorded: March 21, 2026, 10 p.m.

Original

Summarized

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record | Electronic Frontier Foundation

AboutContact
Press
People
Opportunities
EFF's 35th Anniversary

IssuesFree Speech
Privacy
Creativity and Innovation
Transparency
International
Security

Our WorkDeeplinks Blog
Press Releases
Events
Legal Cases
Whitepapers
Podcast
Annual Reports

Take ActionAction Center
Electronic Frontier Alliance
Volunteer

ToolsPrivacy Badger
Surveillance Self-Defense
Certbot
Atlas of Surveillance
Cover Your Tracks
Street Level Surveillance
apkeep

DonateDonate to EFF
Giving Societies
Shop
Sponsorships
Other Ways to Give
Membership FAQ

DonateDonate to EFF
Shop
Other Ways to Give

Email updates on news, actions,
and events in your area.

Join EFF Lists

Electronic Frontier Foundation

Donate

Privacy’s Defender: My Thirty-Year Fight Against Digital Surveillance

Electronic Frontier Foundation

AboutContact
Press
People
Opportunities
EFF's 35th Anniversary

IssuesFree Speech
Privacy
Creativity and Innovation
Transparency
International
Security

Our WorkDeeplinks Blog
Press Releases
Events
Legal Cases
Whitepapers
Podcast
Annual Reports

Take ActionAction Center
Electronic Frontier Alliance
Volunteer

ToolsPrivacy Badger
Surveillance Self-Defense
Certbot
Atlas of Surveillance
Cover Your Tracks
Street Level Surveillance
apkeep

DonateDonate to EFF
Giving Societies
Shop
Sponsorships
Other Ways to Give
Membership FAQ

DonateDonate to EFF
Shop
Other Ways to Give

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record

DEEPLINKS BLOG

By Joe MullinMarch 16, 2026

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record

Share It
Share on Mastodon
Share on Twitter
Share on Facebook
Copy link

Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper.
That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online in the mid-1990s. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains more than one trillion archived web pages and is used daily by journalists, researchers, and courts.
But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web’s traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit.
For nearly three decades, historians, journalists, and the public have relied on the Internet Archive to preserve news sites as they appeared online. Those archived pages are often the only reliable record of how stories were originally published. In many cases, articles get edited, changed, or removed—sometimes openly, sometimes not. The Internet Archive often becomes the only source for seeing those changes. When major publishers block the Archive’s crawlers, that historical record starts to disappear.
The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several—including the Times—are now suing AI companies over whether training models on copyrighted material violates the law. There’s a strong case that such training is fair use.
Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response. Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn’t start, and didn’t ask for.
If publishers shut the Archive out, they aren’t just limiting bots. They’re erasing the historical record.
Archiving and Search Are Legal
Making material searchable is a well-established fair use. Courts have long recognized it’s often impossible to build a searchable index without making copies of the underlying material. That’s why when Google copied entire books in order to make a searchable database, courts rightly recognized it as a clear fair use. The copying served a transformative purpose: enabling discovery, research, and new insights about creative works.
The Internet Archive operates on the same principle. Just as physical libraries preserve newspapers for future readers, the Archive preserves the web’s historical record. Researchers and journalists rely on it every day. According to Archive staff, Wikipedia alone links to more than 2.6 million news articles preserved at the Archive, spanning 249 languages. And that’s only one example. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online.
The same legal principles that protect search engines must also protect archives and libraries. Even if courts place limits on AI training, the law protecting search and web archiving is already well established.
The Internet Archive has preserved the web’s historical record for nearly thirty years. If major publishers begin blocking that mission, future researchers may find that huge portions of that historical record have simply vanished. There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.

Related Issues
Artificial IntelligenceCreativity & Innovation

Share It
Share on Mastodon
Share on Twitter
Share on Facebook
Copy link

Related Updates

Deeplinks Blog
by Corynne McSherry, Tori Noble
| March 10, 2026
The Government Must Not Force Companies to Participate in AI-powered Surveillance

The rapidly escalating conflict between Anthropic and the Pentagon, which started when the company refused to let the government use its technology to spy on Americans, has now gone to court. The Department of Defense retaliated by designating the company a “supply chain risk” (SCR). Now, Anthropic is asking...

Deeplinks Blog
by Corynne McSherry, Matthew Guariglia
| March 6, 2026
Weasel Words: OpenAI’s Pentagon Deal Won’t Stop AI‑Powered Surveillance

OpenAI, the maker of ChaptGPT, is rightfully facing widespread criticism for its decisions to fill the gap the U.S. Department of Defense (DoD) created when rival Anthropic refused to drop its restrictions against using its AI for surveillance and autonomous weapons systems. After protests from both users and...

Deeplinks Blog
by Matthew Guariglia
| March 3, 2026
The Anthropic-DOD Conflict: Privacy Protections Shouldn’t Depend On the Decisions of a Few Powerful People

The U.S. military has officially ended its $200 million contract with AI company Anthropic and has ordered all other military contractors to cease use of their products. Why? Because of a dispute over what the government could and could not use Anthropic’s technology to do. Anthropic had made it...

Deeplinks Blog
by Matthew Guariglia
| February 24, 2026
Tech Companies Shouldn’t Be Bullied Into Doing Surveillance

The Secretary of Defense has given an ultimatum to the artificial intelligence company Anthropic in an attempt to bully them into making their technology available to the U.S. military without any restrictions for their use. Anthropic should stick by their principles and refuse to allow their technology to be...

Deeplinks Blog
by Samantha Baldwin, Alexis Hancock
| February 19, 2026
EFF’s Policy on LLM-Assisted Contributions to Our Open-Source Projects

We recently introduced a policy governing large language model (LLM) assisted contributions to EFF's open-source projects. At EFF, we strive to produce high quality software tools, rather than simply generating more lines of code in less time. We now explicitly require that contributors understand the code they submit to...

Deeplinks Blog
by Tori Noble, Katharine Trendacosta, Kit Walsh
| February 4, 2026
Smart AI Policy Means Examining Its Real Harms and Benefits

We are inundated with advertisements and exhortations to use the latest AI-powered apps, and with hype insisting AI can solve any problem. Obscured by this hype, there are some real examples of AI proving to be a helpful tool. So let’s look at the real-world landscape.

Deeplinks Blog
by Joe Mullin
| January 23, 2026
Search Engines, AI, And The Long Fight Over Fair Use

Long before generative AI, copyright holders warned that new technologies for reading and analyzing information would destroy creativity. Internet search engines, they argued, were infringement machines—tools that copied copyrighted works at scale without permission. As they had with earlier information technologies like the photocopier and the VCR, copyright owners sued.

Deeplinks Blog
by Tori Noble
| January 21, 2026
Copyright Kills Competition

Copyright owners increasingly claim more draconian copyright law and policy will fight back against big tech companies. In reality, copyright gives the most powerful companies even more control over creators and competitors. Today’s copyright policy concentrates power among a handful of corporate gatekeepers—at everyone else’s expense. We need a system...

Deeplinks Blog
by Tori Noble
| December 25, 2025
Artificial Intelligence, Copyright, and the Fight for User Rights: 2025 in Review

A growing wave of copyright lawsuits and dangerous policies threaten beneficial uses of AI. In 2025, EFF fought back

Deeplinks Blog
by Mario Trujillo, Jacob Hoffman-Andrews, Tori Noble
| December 2, 2025
AI Chatbot Companies Should Protect Your Conversations From Bulk Surveillance

AI companies have a responsibility to their users to make sure the warrant requirement is strictly followed, to resist unlawful bulk surveillance requests, and to be transparent with their users about the number of government requests they receive.

Share on MastodonShare on XShare on Facebook

Related IssuesArtificial IntelligenceCreativity & Innovation

Follow EFF:

mastodon
facebook
instagram
x
Blue Sky
youtube
flicker
linkedin
tiktok
threads

Check out our 4-star rating on Charity Navigator.

Contact
General
Legal
Security
Membership
Press

About
Calendar
Volunteer
Victories
History
Internships
Jobs
Staff
Diversity & Inclusion

Issues
Free Speech
Privacy
Creativity & Innovation
Transparency
International
Security

Updates
Blog
Press Releases
Events
Legal Cases
Whitepapers
EFFector Newsletter

Press
Press Contact

Donate
Join or Renew Membership Online
One-Time Donation Online
Giving Societies
Corporate Giving and Sponsorship
Shop
Other Ways to Give

JavaScript license information

The Electronic Frontier Foundation (EFF) highlights a concerning trend: major news publishers, such as The New York Times and The Guardian, are blocking the Internet Archive’s Wayback Machine from archiving their websites. This action, driven by concerns about AI companies scraping news content, poses a significant risk of erasing a critical historical record. According to Joe Mullin, this is not an effective strategy to combat the potential misuse of AI, as it fundamentally threatens the preservation of web history. The EFF argues that blocking the Archive’s crawlers, which operate within established legal frameworks, is a misguided response to a complex issue.

The core of the argument rests on the legal precedent established around search engines and archiving. Courts have repeatedly recognized that making material searchable, including creating indexes, constitutes fair use, particularly when it facilitates discovery and research. The Internet Archive’s operation mirrors this principle – it’s preserving the web’s historical record, akin to how physical libraries archive newspapers. The EFF emphasizes that the Archive isn’t building commercial AI systems; it’s dedicated to safeguarding public knowledge. They point to the Archive’s reliance by organizations like Wikipedia, containing over 2.6 million news articles across 249 languages, as evidence of its crucial role.

Furthermore, the EFF contends that by attempting to control access to archived content, publishers are not simply limiting bots. They are actively erasing a foundational record of online journalism and public discourse. The legal battles currently underway regarding AI training on copyrighted material are separate from the fundamental right of archives to preserve and provide access to historical information. The EFF stresses that sacrificing this public record in the pursuit of controlling AI access would represent a profound and potentially irreversible mistake. The conflict between the publishers and the Internet Archive underscores a broader struggle over control of information and its accessibility, particularly in an era of rapidly evolving technologies like artificial intelligence.