Skip to main content

Internet Archive will ignore robots.txt files to keep historical record accurate

The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.

In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.

Recommended Videos

“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”

Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.

The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.

The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.

Brad Jones
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
Apple loses AI whiz to Meta with an offer that will make your eyes water
Meta AI widget on Home Screen.

It was just last month that OpenAI boss Sam Altman claimed that Meta had been trying to poach his top AI engineers by offering hiring bonuses of as much as $100 million.

There was renewed interest in the matter earlier this week when it emerged that Ruoming Pang, an esteemed AI engineer who oversaw Apple’s AI models, had jumped ship to Meta.

Read more
I found the best Prime Day deal on a tablet hidden beyond Amazon
Microsoft Surface Pro 12-inch, stylus, and keyboard.

A good tablet can take your productivity to the next level, but a boring one will find a niche use and eat dust on a table or couch for most of its time. I love iPads and have been pushing them – as far as I can — to act as my primary computing machine for nearly half a decade now. It has never managed to replace a proper laptop, like a MacBook Air or a Windows machine. 

Why not buy a Windows laptop, you might ask? Well, Windows-powered tablets, especially those Surface devices sold by Microsoft, are pretty expensive. I love the new 12-inch Surface Pro, but at $799, it felt like a steep purchase despite its impressive specifications. 

Read more
Prime Day is over, but this powerful Dell laptop is still at its lowest price
The Dell Vostro 3530 laptop on a white background.

Prime Day is already over, but that doesn't mean that there are no more laptop deals for you to shop on Amazon. Here's one that caught our eye -- the Dell Vostro 3530 with 32GB of RAM for its lowest-ever price of $649, following a 28% discount on its original price of $899. This limited-time offer of $250 off may not last much longer though, so if you want to take advantage of this bargain, we highly recommend that you finalize your purchase for this device as soon as you can.

Buy Now

Read more