« New Study: Search Engine Listings Drive Six Times More Sales than Banners! | The Top 5 Tips and the Top 5 Mistakes of Search Engine Marketing »

June 15, 2001

Exclusive Interview: Learn Inside Secrets of How Google Works!

As most of you know, Fredrick Marckini is the founder and CEO of iProspect.com, a leading search engine positioning services firm and a frequent contributor to the MarketPosition Newsletter. He was generous enough to provide us with articles based on several recent exclusive interviews he was granted with senior executives and engineers at major search engines and directories. These interviews were conducted in the course of writing his newest book, "Search Engine Positioning" the follow-on to "Achieving Top-10 Rankings In Internet Search Engines."

We will continue to feature articles developed from these interviews in future issues of the MarketPosition newsletter, giving you unprecedented information and insight into each search engine interviewed.

Mr. Marckini's latest book, "Search Engine Positioning," is in bookstores now. The information gained from these interviews will be presented in his 500+ page book's "how-to" engine-by-engine chapters, but you're getting it first right here from MarketPosition!

Google Interview by Fredrick Marckini

With over 1.3 billion documents currently indexed and roughly 29 million daily searches, Google has emerged as a search powerhouse for the 21st century. Partnered with the industry powerhouse, Yahoo!, Google has established itself as a market leader. Much of Google's strength can be credited to their uncompromising attitude towards search. Google strives to make their search experience "an easy, honest and objective way to find high-quality websites with information relevant to your search."

Read more about Google's technology here.

Google's lofty aspirations and revolutionary Web page analysis tools continue to drive more and more users to their search engine. I had a chance to talk with Craig Silverstein of Google for an exclusive interview. We covered a number of topics including Google's multifaceted ranking algorithm, shedding light on some of its various components. Most importantly for readers of MarketPosition, Silverstein divulged how their ranking technique differs from the other major search engines and what factors are likely to give your Web site a higher ranking. What was most apparent after our interview with Google was this: Search engine positioning is a process, not a project. You can't simply press one button and find your site launched to the top of the search results.

Search engine visibility is gained over time. That's why tools such as WebPosition Gold are so important for companies seeking to grow their search engine visibility. It's an ongoing process that must address an ever-evolving search engine landscape and how your Web site will interface with those many search engines. Remember, you heard it from iProspect.com first: search engine positioning is a process, not a project. iProspect.com's interview with Google revealed how intricate Google's page scoring algorithm has become.

PageRank

PageRank is one of the fundamental aspects of Google's page-scoring algorithm. Google describes PageRank as the following:

"PageRank relies on the uniquely democratic nature of the Web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages important".

Keep in mind that PageRank does not consider outbound links. Therefore, the links on your Web page to other sites across the Web have no impact on your PageRank score. However, outbound links are important for establishing your page's reputation as a source, an "authority" on a topic. Furthermore, remember that PageRank is conducted on a page-by-page basis, thus different pages within one domain are likely to have unique PageRanks. It should be pointed out that PageRank does consider links that are within the same domain. Hence, pages within a domain linking to another page within that same domain impact PageRank - if there is a page in your Web site that all the other pages of your Web site link to, it will enjoy a higher PageRank score and may rank better than other pages in your Web site. When you consider that most Web pages have a link to "home" its no wonder that a site's home page can enjoy a higher ranking than internal pages.

Craig pointed out something very interesting during our talk: External links that you grant from a particular page on your Web site can become diluted. In other words, if you place 10,000 links to other Web pages from a particular page of your Web site, each link is less powerful than if you were to link to only five other Web pages. Or, the contribution value to another Web site of each individual link is weakened the more you grant.

PageRank and Search

"While PageRank helps clarify the quality or importance of a Web page, it provides no insight into how well that Web page matches your particular information need. Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query contextually. Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (as well as the content of the pages linking to it) to determine if it's a good match for your query". So what other factors have weight in Google's ranking algorithm?

The Term Vector Theory

Google's algorithm incorporates the ideas and understanding behind the term vector theory. While the elements of the term vector theory can be quite complex, Craig offered a rather basic definition of how the theory originated. A premise of the term vector theory "says the documents are good if they contain the words in your query and they contain them a lot," explained Silverstein. As search has matured and grown more complex, Google has adapted their algorithm to complement these changes and to account for those who try to cheat and trick the search engines. While the algorithm has adjusted with the times, in essence it still embraces the beliefs behind the term vector theory.

Scoring = PageRank + Term Vector

The term vector factors of the Google ranking algorithm, which will be covered below, concentrate on how relevant a page is to a user's search. This score, combined with the PageRank score that measures the popularity of the page, is how Google derives an overall score or ranking of a Web page. Thus, the Web pages that receive high scores are, in Google's opinion, the Web pages that best meet the user's individual needs.

The following are some of the on-the-page considerations that Craig revealed that apply to Google's term vector portion of their ranking algorithm. These factors may be the difference between getting listed in one of today's most influential search engines and being left out in the dark.

Word Proximity

As discussed in the term vector theory, the presence and number of times the query words appear in the document has significance. However, Google also takes the word or phrase proximity into account. For instance, if a search is conducted using the expression "Thai restaurants in Cleveland," how closely these words appear to each other within the document has consequence. If all the words appear within the document multiple times, but the word "Thai" is nowhere near "restaurants" and the word "Cleveland" is also alienated, the page's ranking is likely to be diminished.

HTML Mark-Up

A more recent factor that has been added to the Google algorithm looks at your page's HTML. Factors such as what section the word appears within the page, the size of the font, and the use of bold text affect the ranking of a Web page. Silverstein expressed that words in tiny font are considered less important in their ranking formula. In addition, words in the title and heading sections, and words in large font and in bold are given higher rank consideration.

Web Site Size

Craig revealed that Web pages are graded individually; therefore there is neither consequence nor value for being part of a larger Web site. This is an important piece of information to those optimizing large corporate sites or four-page sites - technically, all are equal.

Inbound Links

As mentioned in the PageRank discussion, the number of pages linking to your Web page affects your PageRank score. However, there are also other factors regarding inbound links that are distinct from the PageRank score that affect the term vectoring score. Google also considers the anchor text of a link in determining the page's term vector score. Silverstein expanded on this topic. "When we consider the text of the page we not only consider what the author wrote and actually put on the Web page, we also consider what other people wrote when they linked to that Web page." Well, what about people who link to your site using inappropriate keywords? Could this hurt your rankings? Craig explained that Google takes this into account when investigating anchor text and off-topic anchor text in links to your Web site will not adversely impact your rankings under Google's algorithm. A link from an unrelated Web site does not harm the site's term vector rank. That being said, it does not matter if the site linking to you is not related, however, as stated, the text that describes the link to your site is of importance.

meta Data

Don't bother fine-tuning your meta data for Google's engine. Google ignores both the meta keywords and meta description. Silverstein noted that Google doesn't view meta data as trustworthy so they completely overlook this section. Furthermore, Google uses what they call a "snippet" to describe the contents of the page in the search results, instead of what most search engines use, the contents of the meta description tag. The "snippet" has advantages over the meta description in that the "snippet" places the search query in the context of its use in the body of the Web site, allowing the user to determine if the Web page is what they are looking for.

Dynamic Content

Google has begun to crawl and index dynamically generated Web sites. Craig mentioned that crawling dynamic pages is fairly new to Google's service and that it is still a work in progress. Silverstein also realizes that crawling these types of pages is dangerous in that dynamic pages are generated on the fly, leaving the potential to get caught up in an infinite space of page generation (there is the potential that a search engine spider could get caught in a recursive trap, indexing thousands of years of calendar months by continuing to follow the "next month" for example). Although Google is approaching dynamic pages slowly and cautiously, they are now included in the Google database and that's good news for many.

Web Site Submissions

Google does not encourage Web page submissions. Craig mentioned that Google feels "that the best way to get an accurate score is when we discover a page all on the links because then we can calculate the PageRank."

Although Google feels that submission is not entirely necessary and a quality site will be found without it, Google does offer Web site owners an opportunity to submit their sites. The following Google Web page, http://www.google.com/addurl.html, allows users to add their URL and any comments they may have. The page mentions, however, that Google cannot make any predictions or guarantees about when or if a site will appear within their database. Don't bother submitting, though, if your site is so new that no one links to it. Craig indicated that while Google might spider your site, it won't be added to the database until someone else links to it. Webmasters take note, Craig indicated that even one external link starts a Web site on the road to being found in Google. Large companies should pay special attention when launching a new product Web site - communicate with other product managers and demand links to your newest sites from their existing ones. At iProspect.com, we're often startled at how much leverage is available to Fortune 1000 companies as a result of the many Web sites they maintain but how few take advantage of it.

Here's a wake-up call from iProspect.com to America's largest companies: "if you're not properly inter-linking your many product Web sites, if you don't have a company-wide strategy to leverage the power of your many Web properties in the major search engines, you lose, no matter how many search engines you submit to." Search engine positioning is necessarily an ongoing process. As we continue to say, "search engine positioning is a process, not a project." If you're a large company and you need consultation in building a properly cross-linked community of your own Web properties, contact iProspect.com. The opportunity cost in failing to understanding how the search engines interact with your many Web sites can be staggering.

Google's Database

Google conducts crawls and updates their database on a monthly basis. Google does not promise to crawl an entire site. Google does claim, however, that they try to crawl what they consider to be the "most important" pages. It should also be noted that Google does not intrinsically limit the total number of pages that they will crawl on a large site.

Combating Spam

A key advantage of the PageRank rating system is that it is a fundamental tool in fighting spam. Silverstein articulated that PageRank makes Google virtually immune to spam because even if a Web page makes it into Google's database by using unethical methods, there is very little chance it will receive a high PageRank rating. The one place Craig did notice where spam might be able to creep in is where there are no solid results for a particular query. In this situation, since there are no "good" Web sites containing these phrases, low-PageRank sites have the opportunity to gain a top ranking, since there are no pages with a high PageRank to drown out the spam.

Sites that are caught abusing Google's "search engine power" are dropped from the database and banned forever. Craig specifically mentioned cloaking as an example of this abuse and that companies that employ cloaking technology will be banned if and when they're caught, "Right now our policy is that cloaking is forbidden." Google feels it cannot score a page if they get a different experience than the user.

For companies who have built Web sites without much text where the copy is represented in the graphics Craig suggests that they employ the ALT tags and either re-create the body copy contained in the graphics in the ALT tags and attempt to describe the "concepts" depicted in the graphics. Now this next bit was particularly interesting, not just because it reinforced Google's policy against cloaking, but because it revealed how much weight Google places on page layout and keyword placement within a page's HTML. In a follow-on question about cloaking, I asked if Google's policy against cloaking was absolute. For instance, what if a company serves the exact same content that's presented in the Web page but as a text-only version of that Web page. Conceivably, theoretically, there could be no harm in serving a cloaked text-only version of a graphics-only un-indexible Web page if that text-only page matched the un-indexible page word-for-word, right? Wrong. Google would consider this "harmless" cloaking, harmful too, and you would be banned if caught according to Craig. Here's an excerpt from the actual exchange that hinted at the level of importance that Google places on page layout:

"iProspect.com: Would you be intrinsically opposed to someone serving a text only page that is identical textually to a page that is all graphics where there's nothing indexible?

"Google: Whether there's all graphics or there's nothing indexible we encourage people to use ALT text for the pictures to describe the concept the pictures have. If, on the other hand they want to make a totally different page that's more of a gray area."

"iProspect.com: We're trying to make the distinction here that in this hypothetical example, the caveat is that [the cloaked Web site] would be word-for-word identical to what's displayed in the graphics."

"Google: But of course the layout's different. We do look at layout in order to determine scoring."

Craig went on to say that the problem in our hypothetical example was that the text-only version of our un-indexible site would have to faithfully reproduce the parts of the page that were given "emphasis" such as items in bold and headlines, italics, etc. It seems that you cannot underestimate the amount of emphasis that Google places on keyword placement in the layout of a page's HTML.

The Monetization of Google

I took the chance to ask Craig whether or not Google was considering selling rankings within their search results in the near future. Silverstein pointed out that Google is proud of the fact that their search engine serves unbiased results. While some engines opt to sell a high ranking for particular search expressions, Google prides itself on presenting impartial and fair results.

Clearly, Google takes satisfaction in keeping their users happy by presenting them with accurate and timely results. As more and more engines seek profitability and decide to monetize their search, Google focuses their efforts on user satisfaction. Evidently, Google's strategy and focus has been effective. With 29 million daily searches and growing, Google continues to climb the ranks as one of the top search engines.

Digg.com    del.icio.us    furl.net    newsvine.com    reddit.com    Yahoo! Myweb   ← What is this?

Read more articles in the Google topic category.

« Previous | Next »