How Google Search Engine Works?
When some person queries something on google search, the results are shown with in fraction of seconds, this process follows a complicated process, through google web server, index server, and doc servers. Google web server sends query to index server (its like searching index of a book), from where the data goes to doc server (its like you have got your required topic from index), and now results are there on your screen, you can further expand them by clicking. How actually its happening, if you are a webdeveloper, must know about it, since this will effect your Rank and ultimately web traffic.Three processes are actually involved in it 1. crawling, 2. indexing, 3. serving results
Crawling: Crawling is a process by which google spiders or robots or bots, finds out new webpages or updated pages, to send data to indexing. Remember google does not accept any payments to crawl your website more frequently, they only earn by google adwords.
Indexing: Indexing is the process by which google compile a massive index of all the words it sees and their location on each page, information in content tags, title tags.
Serving results: Serving results are the results which are shown as you make some queries, this depends upon the page rank of your website, which ultimately determined by the content of your site and how frequent you are getting crawling.
How does a robot decides where to visit ?
This depends on the robot, each one uses different strategies. In general they start from a historical list of URLs, especially of documents with many links elsewhere, such as server lists, “What’s New” pages, and the most popular sites on the Web.
How does indexing robot decide what to index?
Robots takes your content, document, it may decide to break it, redistribute it and insert in its database, some robots index the HTML Titles, or the first few paragraphs, or parse the entire HTML and index all words.
What type of files can google index?
Google indexes following type of files
Adobe Portable Document Format (.pdf)
Text (.ans, .txt)
click here to see list
Design and Content Guidelines for your website
Create a useful, information-rich site, try to use text instead of images , google spiders donot recognise images, your title elements and alt attributes are descriptive, Don’t create multiple pages, subdomains, or domains.
something technical: Allow search bots to crawl your sites, use of the robots.txt file on your web server.
Donot do cloaking? Make webpages for users, not for search engines, write content by keeping in mind that it should benefit a user, donot write anything to fill content for search engines…. process known as cloaking.





superb information. i like it.