New York Times Doesn’t Want Its Stories Archived

Nikita Mazurov

The New York Times tried to block a web crawler that was affiliated with the famous Internet Archive, a project whose easy-to-use comparisons of article versions has sometimes led to embarrassment for the newspaper.

In 2021, the New York Times added “ia_archiver” — a bot that, in the past, captured huge numbers of websites for the Internet Archive — to a list that instructs certain crawlers to stay out of its website.

Crawlers are programs that work as automated bots to trawl websites, collecting data and sending it back to a repository, a process known as scraping. Such bots power search engines and the Internet Archive’s Wayback Machine, a service that facilitates the archiving and viewing of historic versions of websites going back to 1996.