« What Advertising Really Works? | Benefits of Doorway Pages »
November 15, 1998
What really happens after you submit?
Recently a customer e-mailed me with the following question:
"What exactly happens when one of the search engines spiders a site? I'm really not clear on what info is stored by the search engine to allow a certain keyword to be indexed. I guess what I'm asking is, for example, this URL is spidered http://www.intothewind.com/kites.html and then after the search engine has indexed the site, I do a search for the keyword kite. What exactly happens within the search engine when I hit the search button to cause that page to come up in a certain position?"
Answer: Nobody except the search engine developers themselves know *exactly* what happens after you submit your page. However, I?ve talked with many industry insiders, and being a database programmer myself, I believe I can explain the process in sufficient detail.
Once you submit your page, the search engine will add your URL (page) to a list of sites to be "spidered". It can take anywhere from one day to a number of months for a search engine to index or "spider" your site. Once your URL is spidered by the search engine, it will read all the words on the page and filter out parts it does not want to see.
For example, most engines save space and improve search speed by excluding many common words from its database. These are called "stop" words and include words like "a", "and", "the", "or", etc. Check out the following paragraph from Don?s kite page:
America's #1 Kite Catalog for 18 Years. Our FREE 96 page 1998 Kite
Catalog has the best selection of kites from around the world
The search engine might store only the following words in its database:
America's #1 Kite Catalog 18 Years FREE 96 page 1998 Kite
Catalog best selection kites around world
We have yet to research exactly which words are excluded from search engine databases. This should not be of great concern, however, since the search engine will automatically, behind the scenes, exclude these same words from the users search. To convince yourself that there are words that are excluded from the search engines, go to Lycos and search for the word "the". Lycos will find no matches even though there are millions of occurrences of the word "the" through out the Web.
Each search engine will also filter out other things like meta tags, comment tags, or whatever else they don?t want to include in their database. Each search engine is different.
One of the biggest potential pitfalls for Webmasters are search engines that do not index the text within frames. Currently, several major engines do not support frames, including Excite, WebCrawler, and HotBot. For these engines, include text in the <NOFRAMES> tag. Another solution is to create pages (without frames) to act as "doorways" or alternate entrances to your site for the purpose of giving these engines something they can "see" and index.
After "stop" words are filtered, and any tags not supported by the engine are excluded, EVERYTHING else remaining contains potential "keywords" that could be used to find your page. However, to actually be found in an engine, you must *emphasize* the specific keywords and phrases that you want your site to be found under. Your important keywords and keyword phrases must be strategically placed throughout your Web page in order to return your site as one of the top 10 to 30 matches. That?s exactly what our online book (included with WebPosition) and this newsletter, helps you do. Being in the top 10 or 20 results can bring hundreds, or even thousands of new visitors to your site each week for each keyword your site achieves top ranking on.
Search engines determine a site?s "relevancy" by using a complex "scoring" system that the search engines try to keep secret. This system adds or subtracts points based on things like how many times the keyword appeared on the page, where on the page it appeared, how many total words were found, and other variables. Points can be subtracted if you repeat a keyword too often, don?t use enough words on the page, or do any of a dozen other things, whether on purpose or by mistake. The pages that achieve the most "points" are returned at the top of the search results, the rest are buried at the bottom, never to be found.
All of this may sound complicated. It can be, without someone keeping track of the current rules for you. Armed with the right information, it?s not hard to rank well on most keywords and/or phrases. Those of you who make the effort reap big rewards in the form of free traffic to your Web site.
Some keywords are more competitive than others, but the idea is to focus on a variety of search phrases (two or more words) since 80% of queries are for two or more keywords.
← What is this?
