« Research Report Offers Online Retailers Good Advice | No Ranking In Stone, Understanding Google Data Centers »

November 17, 2006

Microsoft, Google and Yahoo! Adopt Standard Sitemap Protocol

by Richard Drawhorn

The three major search engines have agreed on a sitemap protocol to make it easier for search crawlers to index web site content. In this post, I'll take a detailed look at the accepted site map protocol and provide an example. This should be of interest to you if your web site does not yet have a sitemap, or if you wish to ensure your existing sitemap is in the correct format.

Acceptance of a common sitemap protocol has been announced by Microsoft, Google and Yahoo! on their search blogs. Sitemaps.org is the official web site that describes the protocol, and it's recommended that you take the time to study the site and create a sitemap that conforms to the format outlined there.

The accepted standard is a sitemap created in XML, a type of markup language that allows information to be shared and understood across a variety of platforms and software. The language itself is fairly intuitive and it's not difficult to learn and put to use. Like HTML, the tags used in the XML language are enclosed in brackets and each tag has an opening and closing tag. For example, take a look at the simplest possible XML sitemap which contains only a single URL entry:


<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.marketposition.com/</loc>
   </url>
</urlset>

The above is in fact a complete sitemap because it contains the essential elements:


  • begins and ends with the <urlset> tag
  • defines the protocol standard
  • contains <url> and <loc> entries for each page listed in the sitemap.

Note that each tag has an opening and closing tag associated with it, and the tags are nested. The first line shown is the XML declaration, which states the version of XML being used and the character encoding for the document. According to the accepted sitemap protocol, the document must be UTF-8 encoded.

The page location is the only required element for the URL, but there are a few optional tags that can be added. The optional elements include the date the page was last modified, the frequency at which the page is likely to change, and the priority of the page relative to other pages on the site. The priority is a numeric value between 0 and 1, and the default is set to 0.5. The idea is to set a higher priority value for pages that you consider more important and want to ensure that the crawlers give them priority. Note that the priority you set here in no way affects how your pages are ranked in search engine results; it's used only by search engine crawlers to prioritize the pages they are crawling.

The next example shows a complete sitemap that contains two pages and the optional elements mentioned above are included within the <url> tag:


<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.marketposition.com/</loc>
      <lastmod>2006-11-20</lastmod>
      <changefreq>monthly</changefreq>
       <priority>1.0</priority>
   </url>
   <url>
      <loc>http://www.marketposition.com/syndication_news.php</loc>
      <lastmod>2006-06-12</lastmod>
      <changefreq>monthly</changefreq>
       <priority>0.8</priority>
   </url>
</urlset>


Note how each page is listed in its own <url> tag and contains the required <loc> tag and the optional tags within it. In the above example, I set the priority of the MarketPosition.com home page to 1.0, the highest possible priority. Take a look at the complete MarketPosition.com sitemap I created using the accepted protocol. I've set the priorities of the home page and category pages only, and decided not to include any of the other optional tags. The new sitemap has been submitted to both Google's Sitemap program and the Yahoo! Site Explorer program. If you haven't already done so, I recommend that you sign up for both programs and submit a sitemap in the accepted format.

Conclusion

Given the fact that all three of the major search engines have agreed on a standard protocol for sitemaps, it's in the best interest of any web site manager to create a sitemap that conforms to the accepted standard. The protocol makes use of XML, and is therefore easy to implement and maintain. The format is clearly defined at the official sitemaps web site: www.sitemaps.org.

Digg.com    del.icio.us    furl.net    newsvine.com    reddit.com    Yahoo! Myweb   ← What is this?

Read more articles in the Google topic category.

« Previous | Next »