Who hasn’t said, “It’s okay, I’ll Google it.”? This has become the standard response of the day when we do not know the answer to a question. But with literally millions of sites, how do we know Google will come up with the right answer to our question? How does Google find a restaurant or business in our general vicinity when we need one? The web is growing exponentially every day, but how does it come up with the answers to our queries?
Make-up of Google
Google is set up basically like a distributed network which carries out very fast parallel processing. This method of computation performs many calculations all at the same time which means that data is processed quickly. There are basically three components which make up Google:
- Googlebot – web crawlers that find and fetch pages
- Indexer- sorts out words on pages and stores the index in a huge database
- Query Processor – compares search queries to the index and then recommends documents that are deemed relevant
The Googlebot is a robot that crawls the vast web and finds pages on the webs then hands them over to the Google indexer. It’s been compared to a spider which scurries across the vast strands that make up cyberspace. However, in reality the Googlebot doesn’t travel across the web at all. Instead it functions like a web browser and sends out requests to web servers for web pages, downloads the pages and then gives them to the Google indexer. The Googlebot is made up of numerous computers which are requesting and gathering pages a lot quicker than can be done with a web browser. Actually, the Googlebot can request literally thousands of pages all at one time. In order to avoid overwhelming the many web servers, the Googlebot purposefully requests information from the individual web servers a lot more slowly than what it is actually capable of doing. The bot is kept current by continuously crawling changing pages. These fresh crawls help Google keep up with the ever changing web content. The fresh crawl basically looks at popular pages which are updated regularly. This is balanced out with what is considered to be a “deep crawl” in which Google takes a deeper look at web content.
Once the Googlebot has crawled the web and gathered the pages it gives its findings to the indexer which stores the pages in Google’s index database. The process of indexing is a way to store the information available across the web in a convenient location where it is easily accessible. This allows Google to retrieve data that has been gathered and display it quickly in query results. Google basically ignores common words like the, is, how, or on. These are called stop words and Googlebots do not take the time to process them. Stop words can tend to narrow down a search and by discarding them, better search results are obtained. In order to improve Google’s performance, the indexer also ignores some punctuation and multiple spaces. It will also convert all letters into lowercase just to make indexing easier and more efficient.
Google’s Query Processor
The query processor is made up of several parts too. It basically consists of a user interface, the search engine and the results formatter. Google looks at lots of factors to determine which indexed documents are the most relevant to the query being posed. There is also an application of “machine-learning” techniques which are designed to learn relationships and associations between data which has been stored. Google keeps its formulas that are used for determining a page’s rank in the search engine result pages (SERPs) very protected. This is done to protect the system from spammers and those who might attempt to sabotage it. Google indexes the web by going beyond just matching single search items. The processor can match words and sentences but with some of the most recent changes to the search process, Google attempts to match more than just words. Semantic search means that Google is trying to match its indexed data to what the user is looking for – not just certain words or phrases. Basically, a user enters a search query and Google goes to work to produce relevant results which contain the desired answer.