« What is organic search and how can it help your company? | Four HTML Reference Sites You Should Not Miss »
May 15, 2005
Google Reveals Ranking Components in Patent Application!
...by sketching their ideal search engine, Google dropped a truckload of clues about how their engine actually ranks sites!
Courtesy of SearchEngineNews.com | May 2005
Have you ever played the game Clue where you run around a mansion and try to collect information to figure out who killed Mr. Body? Well, optimizing a site for Google is a lot like that game. SEO's collect clues, try to put two-and-two together, throw in some educated guesses and ...voila! ...it was Ms. Scarlet in the Conservatory with the Candle Stick! Or, in the case of search engine optimization, they win the game as they watch their site's SE ranking rise.
With Google's recently released Patent Application we've been handed an enormous clue as to what goes on behind the closed doors at Google. Mind you, this document does not provide answers, but it does point us in the direction of what Google can do and what they are considering doing in the future.
Let's take a closer look at the clues and see what we can find...
The patent, titled Information Retrieval Based on Historical Data, was authored by several Google engineers and originally filed December 31, 2003. Now, over a year later, the patent has finally been published and offers several clues regarding how Google ranks pages, what Google considers to be spam, and how they go about detecting that spam.
While most of what is covered is not particularly surprising, it does help confirm a lot of what was previously educated-speculation. The real questions are, how many of these ideas are actually being used at Google? ...and, if not in current use, will they be used in the future? ...and when?
Though we seriously doubt Google employs all of the proprietary methods they list in the document, the patent still covers a huge number of techniques they could use (or are reserving the rights to use) in the future. The patent itself takes a shotgun approach to covering as much ground as possible, as opposed to closely documenting any current or future methods.
The patent also makes frequent reference to the concept of using a web page's score to rank that page, as well as what factors play a role in raising and lowering that score. The implication here is, of course, that high-scoring pages will rank better than low-scoring pages.
Reading Between the Patent's Lines
So what does all of this mean to you and your optimization efforts? Let's run through the patent step-by-step:
Historical Data: The bulk of the document is devoted to ways in which Google could use domain- and site history-data to help determine relevancy. This patent application reveals what kind of history they might consider to be important when it comes to scoring web pages and identifying spam. Some factors include:
* How old a page is.
* How frequently page content changes over time.
* How links to a page are added and removed over time.
* The rate at which links are added or removed.
* How the anchor text of incoming links changes over time.
* The current and previous levels of traffic to a page.
* What position Google ranked the page in the past.
* The age of the domain the page is on and how long it's been registered for.
Document Age: The patent implies that Google closely watches the date documents are created, the average age of documents on the site, and how frequently the page changes over time. Document creation date is determined by the date on which the Google spider first indexes the document, either through its regular crawl or through the page being manually submitted to Google.
For example, a page that is one day old and has ten incoming links may have a higher score than a page that is ten years old and has 100 incoming links, since the younger page has a higher rate of link growth. However, an excessively fast rate of link growth may actually hurt the score of a page, since, to Google, that looks like artificial link accumulation.
The indication here is that older pages will generally be given higher scores than newer pages, but only if those older pages are continually adding new incoming links at a moderate pace. Also, Google takes into consideration the average age of all pages on a site. This means that a new page on an old site will score higher than an equally new page that resides on a new site.
However, for some time-sensitive queries, newer pages may be considered more favorable than older ones. For instance, Google might rank new pages more highly for a phrase like nuclear option, since new pages will be more likely to be discussing issues in the current US Senate. Old pages, in contrast, might be about Cold War politics, and probably not satisfy today's searcher's query.
Rate of Document Change: In general, pages that are updated frequently will score higher. However, Google may watch to see if all web pages on a site are updated at the same time, which may signal a navigation or site template change, as opposed to a page-content change. In such a case, pages may not get credit for being updated, since the change more likely applied to the look of the site, rather than to new information being added.
Individual pages changing frequently may signal a news or blog page, or some other important content, and indicate to Google that those pages should be spidered more frequently, and perhaps ranked higher for their targeted keywords.
When it comes to determining if a web page has been modified, Google will give greatest weight to changes in the title of a page, followed by changes in the anchor text of outgoing links on a page, then to changes in visible page text. Little weight is given to changes in Javascript, on-page ads, date/time tags, comments, or navigational or template elements.
To make things even more challenging, the patent also takes into account the rate of change. This means that, even if a page is updated regularly, it still won't score as highly as a page that is increasing in its rate of change.
For example, imagine that you update your pages once a week and your competitor updates their pages once a month. If your competition switches to updating their pages once a week, they'll receive a higher score than you. This is because, even though you're both updating once a week, they will have accelerated their rate of updates, giving them the advantage.
The patent also discusses how a "stale" document, or one that hasn't changed for a given amount of time, should perhaps be ranked higher than a more recently updated document, if it gets clicked more often than more recently modified pages. This brings us to...
Traffic, Click Tracking, and User Data: Google gives us a few clues to indicate that pages that are clicked more frequently in the Google search engine results pages (SERPs) for a given search term may get a higher ranking.
The patent implies that Google is tracking not just how many clicks a search result is getting, but also what kind of traffic is going to the page in question. Higher relevance is awarded to pages that are accessed through browser bookmarks, for instance, since those pages are assumed to be more valuable to searchers.
High scores are also assigned based on the amount of time a user spends on a page. But how can Google know how long you're browsing a page and whether or not you accessed it from your bookmarks? Now you know what things like My Search History and the Google Toolbar are really for (cue ominous-sounding music).
And what happens if people used to spend 30 seconds on average looking at a page, but now only spend an average of 5 seconds looking at that same page? You got it. Page score goes down.
Even more Orwellian is that the patent indicates that Google may employ information stored in a user's browser cache and cookies to determine the popularity of a page.
Further, Google may look at the rate of traffic, whether it's increasing or decreasing, and in particular the seasonal variation in traffic. This could help them to identify sites that may be important during some parts of the year but not others. Examples could include travel sites, or sites that sell holiday related merchandise.
One of the more potentially controversial aspects of the patent involves the way Google may look at advertising traffic. In particular, Google's rating of the "quality" of the advertisers. A page featuring ads that link to sites that Google trusts (such as their good buddies over at Amazon.com) will likely score higher than a page with ads that link to lower-traffic, less-trusted sites.
While this may make a certain amount of sense, it also raises some flags about the ways in which the online behemoths are able to use their power to collaborate and squeeze the little guy out of contention.
The patent also discusses assigning higher scores to those pages that have higher click-thru rates on their ads. Obviously, the easiest way for them to track this is if you're running AdSense on your site. (Who knows, maybe this whole patent was just a devious attempt by Google to convince more people to put AdSense ads on their pages?)
Links: Google watches how quickly inbound links develop over time, including how quickly they disappear. Google's not just factoring in how many links you have now, they're also taking into account the way in which those links behave and analyzing trends in the growth and decline in the number of links a page has.
A downward trend in the number of links to a page may indicate that the page is "stale" and hence deserves a lower score, while an upward trend indicates fresh content and deserves a higher score.
Changes in the anchor text of links pointing to a page may also indicate fresh content, as can other changes to the pages on which those incoming links appear. This means that it helps to not only have fresh content on your own page, but also for your page to be linked to from other fresh pages.
And, of course, the rate at which those new links are added is critical. Add them too fast and you risk a penalty. As the patent plainly states:
A typical, "legitimate" document attracts back links slowly.
By now, you know our advice here at SearchEngineNews.com has always been to create a natural-looking incoming link structure. This means growing your links gradually, avoiding run-of-site link purchases, varying your incoming link anchor text, and limiting to a certain degree your number of reciprocal links.
Much of this patent just reinforces that advice. You should always be asking yourself if your links fit a natural-looking development cycle.
Domain-Related Information: The patent also covers a few ways Google may distinguish between illegitimate and legitimate domains. One aspect is the length of time a domain is registered for. According to the patent, legitimate domains are more likely to be paid for several years in advance, while doorway and spam-heavy domains are rarely registered for more than a year in advance, since the owner knows they'll likely be penalized anyway. Does this mean you should go out and register your domains for the next ten years? If you plan on being in business that long, we can't see how it can hurt.
Google may also take into account the rate at which whois info, IP address, and name servers change for a given domain. Domains that are frequently being shuffled between owners and locations may be tagged as more likely to contain spam.
And if that hasn't ratcheted up your paranoia, the patent also indicates that Google may look at the other domains that share a name server with your own domain. If there are a lot of questionable domains on your name server, you may get lumped in with a bad crowd, causing your rankings to suffer.
Avoid Artificial Flavors
The primary theme of this patent is that Google is looking to profile web page value based on factors that are naturally created and influenced, as opposed to those that are artificially manipulated. That's essentially what this is all about - finding ways to detect and score documents that deserve to rank highly even if search engines didn't exist.
Something Google Software Engineer Matt Cutts (one of the co-authors of the patent) has frequently relayed in the past was the importance of building pages for human traffic, while trying to ignore the search engines. For those of us who market products online, that may be easier said than done, since so much of what we do is focused on getting our pages to rank higher in the major engines.
The question is, what is truly natural? Considering that SE optimization is built into almost every facet of online commercial activity, the line is somewhat blurred.
Keep Things in Perspective
Of course, it's also important to keep in mind that this is only a patent application. There's no proof that any of these ideas are actually being implemented at Google. For all we know, this patent could just be a brilliant attempt to foster paranoia amongst search engine optimizers and make us second guess the techniques we're currently using.
Further, note that this patent was not put out by Google, but rather by a team of Google engineers. Of course, we're not so naive as to assume that the two aren't one and the same.
Regardless, it does show that Google is quite intent on discovering pages that develop a natural course of growth, as opposed to pages that search engine marketers try to put on the Fast Track to high rankings.
See for Yourself...
Since the patent itself is actually fairly readable, we recommend you check it out yourself (skip the claims and go straight to the description, about 2/5 of the way down). The contents of this patent application have already generated more online buzz than any other previously released Google publication.
Google USPTO Filing - A must read for SEOs
As we've always said: when it comes to Google, if it can be done, then assume it is being done. Familiarizing yourself with this document will give you an idea of just what Google is striving for and will prepare you for what you can expect to see, if not today, than very likely in the near future.
Keep it natural...
Esoos Bobnar
SEM Specialist and Technical Analyst
Planet Ocean Communications
The above article is reprinted with permission from Planet Ocean's SearchEngineNews.com, copyright 2005, and distributed with permission by WebTrends maker of WebPosition, the award-winning software that helps track and improve your search engine rankings. You may download a FREE trial copy of WebPosition from: http://www.webposition.com/trial/
← What is this?
