« Microsoft, Google and Yahoo! Adopt Standard Sitemap Protocol | Web Site Usability: Increase Your Site's Effectiveness with Good Design »

November 28, 2006

No Ranking In Stone, Understanding Google Data Centers

by Scott Goodyear
www.marketposition.com

Ever work to obtain top 10 rankings, achieve them, and then you are told by a client or manager that your hard work can not be found in their web search? Many long-time SEOs know that search results are variable and that no search engine ranking is cast in stone however not everyone knows how to explain this. In this article, I'll explore a few of the reasons why search results can differ and I will also point you to a few resources that you may wish to reference if facing this discussion.

Would you like to (Google) dance? (Updates Occur All The Time Now.)

If you've been performing SEO for any amount of time, you've likely heard of the 'Google Dance' if you've attended one of the Search Engine Conferences in San Jose (near Google HQ). In the past the Google Dance described the time each month when Google's data centers where updated or synchronized with new crawl data, hardware, or other information. As this often occurred at the beginning or ending of the month, it was often a time to wonder if the new updates to the Google ranking method or algorithm would improve or lower rankings for a web site. This old description of the dance is often how many people believe that Google still works. However these days, Google is constantly updating their servers with new crawl information from their spiders and so it is usually harder to pinpoint when or if an update has been made to their algorithm. In a Clickz interview last May, Google Engineer Matt Cutts had said:

"...around 2003 Google switched to a new method of updating its index. Before that we had monthly Google dances. So as a result, new data is always being folded into the index. It's not like there was one pivotal moment when anyone cans say, "Hah! This is the change!" In fact, even at different data centers we have different binaries, different algorithms, different types of data always being tested..."

Other engines such as Yahoo periodically announce 'weather reports' on their Yahoo!Seaerch blog. These reports sometimes coincide with major shifts in rankings. Some times these weather reports are simply to announce support that will be added to their spider's means of understanding items such as meta tags and robots.txt files.

Balancing the load with multiple data centers. Multiple possible search results.

Imagine for a moment that a critically popular piece of news has just rocked the world. It could be an act of nature, some political news, celebrity gossip, or just about anything else with a large audience who might turn to a search engine to learn more. With search queries coming in non-stop and from various locations, search engine must "load balance" the volume of search queries so that all users have a relatively quick search experience. This may mean that some searches cross 300+ machines in order to give the searcher the fastest possible experience. This also means that some engines like Google and others are building new data centers both to better handle search queries as well as to lower costs.

While it is good to have a fast experience, not all of an engine's data centers need always be synchronized with one another. Thus one searcher might find the news in their search, where as another may not. For example one data center could have a database of stored info on your site that was collected back in October where as another data center could have a copy of your site's indexed content from just a few days ago. It is very possible that the data center with October's data will be updated to the current spider's data very shortly but there are no guarantees. Because of this, very different search results can occur simply because of an out of date database, perhaps they are testing a new method of ranking sites, or for other reasons. Below, you will find a few screen shots of a simple search that I ran before and after the US holiday of Thanksgiving. Notice the differences and similarities in search results.


In one search of Google for the phrase 'thanksgiving turkey' on 11/21/06 (before Thanksgiving) these are the last 4 search results on the first page of a Google search:
Google Data Center Differences.
A few minutes after this, another Google search and two new sites show up in rankings 7 and 8 on the first page of search results. I've likely hit a different Google Data Center below:
Possibly a different Google Data Center showing results.
Same search on 11/28/06 (after Thanksgiving). One new result shows up (in red), one search moved up (happydaycards), and two of the earlier URLs maintain their positions:
A post holiday search against Google shows new results.

There are even some online tools that allow you to check and see if there are any changes among the various Google data centers. Unfortunately very little is known as far as which data centers will be used in the future, which are testing data centers, or which are in the pool of available/live data centers when a search is performed. From time to time you will see SEOs speculate in the various SEO forums when major changes are found at one or more data center. When ranking differences are found between a normal search and those searches made at various data centers, if these differences persist in specific data centers, or appear to be propagating across many data centers as time passes, various theories are often formed whether they are true, false, or some shade in-between. The theory of the Google Sandbox and the more recent "Minus Thirty" penalty are just a few of the theories that have been formed by SEOs trying to understand Google's data centers and the reason behind rankings anomalies.

Location, location, location. Where you search from matters.

Depending on where your customer is located, they may obtain regionalized search results based on their IP address. An IP address is a computer address that often helps to identifies a computer on a network, in some ways this is similar to how a street address might identify a business' physical location on a map, or caller ID may identify the caller from a particular phone or company.

While it is not always as obvious in search results whether normal results are influenced by location, there are often regional pay per click ads or 'local listings' which are very obvious in their geotargeting. Below you will see a screen shot of Google pay per click advertisements and some local results presented by Live.com when searching from Portland, Oregon:


Google PPC advertisments display geotrageting:
Google PPC Ads Display Geotargeting
Microsoft live search displays geotrageting:
MSN Local Listings Geotargeting

In the real world Post Office Boxes, mail rooms, or similar services sometimes obscure where a business really exists or operates from. On the web, geotargeting can be hindered intentionally or accidentally by searchers as well. Some searchers use a proxy to surf the web anonymously. Some web searchers may log in to a remote server through their work or school so that they can access intranet resources as well as surf the web. According to geolocation service MaxMind some larger internet connection services like AOL may cause surfers to appear from one country, like the US, even if they are actually surfing from outside of the US. Because of issues like those listed above, many SEOs use IP address proxies in order to test how their international customers may be viewing search engine results from several IP addresses which may be more 'local' in nature to their clients.

If you are curious about how your current IP address may be viewed right now, you may wish to try one or more free IP geolcoation tools online such as: GeoIP tool, Hostip, IP2Location, and others.

Summary

In a nutshell Google results are not cast in stone. If you have a customer that wonders why their search results do not match the rankings that you've found, there are a few things that you may wish cover in order to help them understand the nature of rankings.

1) Google does not update once a month, they are constantly updating their data.
2) Google has many data centers.
3) Not all of the data centers are exact clones of one another. They can differ in what underlying data they contain as well as how they sort results.
4) The location of your IP address as well as the location of your customer's IP address can affect search results.

While these 4 items are not the only factors that can affect search results (further discussion could be made regarding the differences that can be observed due to cookies, prior search history being saved and used to bias future searches, and other factors), these 4 issues are often among the more commonly identified issues that come up when speaking with clients and managers who are not SEO savvy. Being able to understand these concepts and explain them can be critical when a client or manager, who is usually not interested in knowing every aspect of search engine optimization, wants to know why their rankings may not display as expected when they perform a search.

Digg.com    del.icio.us    furl.net    newsvine.com    reddit.com    Yahoo! Myweb   ← What is this?

Read more articles in the Google topic category.

« Previous | Next »