Yesterday, Google announced that it has teamed up with the creator of Robots Exclusion Protocol (REP), Martijn Koster and other webmasters to make the 25 year old protocol an internet standard. The REP, better known as robots.txt, is now submitted to IETF (Internet Engineering Task Force). Google has also open sourced its robots.txt parser and matcher as a C++ library.
https://twitter.com/googlewmc/status/1145634145261051906
REP was created back in 1994 by Martijn Koster, a software engineer who is known for his contribution in internet searching. Since its inception, it has been widely adopted by websites to indicate whether web crawlers and other automatic clients are allowed to access the site or not.
When any automatic client wants to visit a website it first checks for robots.txt that shows something like this:
User-agent: *
Disallow: /
The User-agent: * statement means that this applies to all robots and Disallow: / means that the robot is not allowed to visit any page of the site.
Despite being used widely on the web, it is still not an internet standard. With no set in stone rules, developers have interpreted the “ambiguous de-facto protocol” differently over the years. Also, it has not been updated since its creation to address the modern corner cases. This proposed REP draft is a standardized and extended version of REP that gives publishers fine-grained controls to decide what they like to be crawled on their site and potentially shown to interested users.
The following are some of the important updates in the proposed REP:
This updated REP standard is currently in its draft stage and Google is now seeking feedback from developers. It wrote, “we uploaded the draft to IETF to get feedback from developers who care about the basic building blocks of the internet. As we work to give web creators the controls they need to tell us how much information they want to make available to Googlebot, and by extension, eligible to appear in Search, we have to make sure we get this right.”
To know more in detail check out the official announcement by Google. Also, check out the proposed REP draft.
Do Google Ads secretly track Stack Overflow users?
Curl’s lead developer announces Google’s “plan to reimplement curl in Libcrurl”
Google rejects all 13 shareholder proposals at its annual meeting, despite protesting workers