« Wordtracker Launches UK Based Research Database | Apple.com telling Google "we don't want to rank"? »
May 08, 2007
Teaching New Tricks to Old Robots, New Robots.txt Command for SEOs.
By Scott Goodyear
Sitemaps.org, a collaborative effort by Yahoo, Microsoft, and Google, has announced a change in robots.txt usage. The new change should allow you to tell participating saerch engines where you keep the sitemap on you site. The usage is fairly simple and it is likely that most other engines will soon support this standard.
How To Use The Robots.txt Sitemap Tag
Normally you will want to use a robots.txt file to tell search engines to stay out of certain files or folders. Here is an example where Google's own robots.txt file is excluding many folders from being indexed by their own robot, Googlebot, as well by other search engine robots by utilizing the user agent "*":
| User-agent: * Allow: /searchhistory/ Disallow: /news?output=xhtml& Allow: /news?output=xhtml Disallow: /search Disallow: /groups Disallow: /images Disallow: /catalogs Disallow: /catalogues Disallow: /news Disallow: /nwshp |
But now you can also tell a search engine exactly where you keep your sitemap by including a bit of code in the robots.txt file. For example there is a "sitemap" link at the bottom of most of our MarketPosition pages but we could also tell search engine robots about our HTML sitemap or tell them about the text version of our site map by including the new sitemap command in our robots.txt file (you don't have to have both an HTML and text version, we're just testing to see which seems to work better) like this:
| User-agent: * Allow: Sitemap: http://www.marketposition.com/textsitemap.txt |
In the past, the robots.txt file was pretty one dimensional in purpose. It served only to exclude robots from certain areas of a page or site. If you wanted to exclude them from indexing your images directory, you used a disallow command. If you wanted them to slow down because they were hammering your site with page requests as they crawled, thus slowing your site down, some engines allow for a delay option in the robots.txt file. Below, I've added a command to slow Microsoft down when indexing:
| User-agent: * Allow: User-agent: msnbot Crawl-delay: 60 Sitemap: http://www.marketposition.com/textsitemap.txt |
Further Reading And Discussion
If you want to read more about robots.txt and site maps, consider the following resources:
Controlling Search Engine Robots With robots.txt and Other Methods.
How to Identify Legitimate Search Robots.
The Robotstxt.org and Sitemaps.org web sites.
For more discussion on these topics, consider:
HighRankings.com and WebmasterWorld.
← What is this?
