Sometimes referred to as ‘spiders’ or crawlers, automated search engine robots seek out web pages for the user. Just how do they accomplish this and is this of importance? What is the real purpose of these robots?

A search engine robot is a very simple program that has some basic functionality to help it understand web pages. However, spiders only have limited functionality to interpret websites: they cannot interpret frames, Flash video, images, or JavaScript; they can’t enter password-protected areas and can’t click buttons; they can be stopped by dynamically-generated URLs and JavaScript navigation. However, within HTML code, they’re able to retrieve data by travelling through the web to find information and links.

The ‘submit url’ function places the url into a list of urls the robots are going to explore. Even without submitting your url directly, robots will try to find your site by following links. That’s why building visibility through a web of links is important.

By collecting and following links, robots manage tn transport themselves all over the internet. Think of it as an internet equivalent of the roads we use in our lives. Robots travel on the roads and read the signposts so they know what leads to where.

To ensure that searchers get the right results with the most relevant response to their query, quick calculations are done to see that this happens. Server logs and log statistics program results can be checked by the user to see what pages have been visited and how often. Some robots may be easy to identify such as Google’s ‘Googlebot’, while less well-known ones such as Inktomi’s ‘Slurp’ are not easily identifiable. Some robots even appear to be human-powered browsers.

Once in the database, the information becomes part of the search engine directory and ranking process. Indexing is based on how the search engine engineers have decided to evaluate information returned by the spiders. When you enter a query into a search engine, it uses several calculations behind the scenes to determine which results you’re most likely looking for, out of the sites the spiders have returned. The database selects the best matches and displays them. The database is constantly updated by spiders crawling websites over and over again, to make sure that the most up-to-date information is available.

If you’re interested in seeing which pages the spiders have visited on your website, you can check your server logs or the results from your log statistics. From this information you’ll know which spiders have visited, where they went, when they came, and which pages they crawl most often. Some are easy to identify, such as Google’s ‘Googlebot,’ while others are harder: ‘Slurp’ from Inktomi, for example. In addition to identifying which spiders visit, you can also find if any spiders are draining your bandwidth so that you can block them from your site. The internet has plenty of information on identifying these bad bots. There are also certain things can prevent good spiders from crawling your site, such as the site being down or huge amounts of traffic. This can prevent your site from being re-indexed, though most spiders will eventually come by again to try re-accessing the page.

Justin Harrison is an internationally recognised Internet Marketing expert who provides world class SEO Services to website owners. For more information visit: http://www.seorankings.co.za