I use them, you use them. We all use them! Search engines make our lives so much easier whenever we get curious and need to find something online. But how do search engines work exactly?
Even though we all use them pretty much on a daily basis, we don’t necessarily think about what really happens in the background. Even though the exact dynamics and algorithms are more like a science in their own right, the basic principles are quite easy to grasp.
So in case you’re wondering what really happens before you see any results from your search, read on!
In case you’re new to programming, you might find this post about learning computer science basics quite helpful.
Here’s a short overview of the contents of this post:
- What do search engines do?
- Basic structure of a search engine: crawling, indexing, ranking
- How to build your own search engine
1. What do search engines do?
Despite being such massive and complex creatures, all search engines serve only one purpose. They are there to browse through the vast spaces of the Web to pair the user (that’s you!) with the best possible results for whatever they are curious about.
Let’s consider an example. Think of all the content, all the pages and documents on the Web. It’s quite a massive collection of information, sort of like a library with an endless number of books. It already has billions of items in it, and it’s growing all the time.
Now, imagine trying to find a book at this library that has no directory or alphabetization. In other words, the books are in no specific order. Without classification and indexing, it’s like trying to find a needle in a haystack.
How could one build a directory for this giant library to make things easier for everyone? That’s where search engines step in: they are doing their best to find the most relevant books based on what you’re looking for.
But how do search engines work exactly? That’s the question I’ll try to answer in this post. I know I’m just scratching the surface of the subject, though. The exact determinants of preferring some results over others for a given search are the best-kept secrets of the industry and vary greatly across different engines.
2. Basic structure of a search engine
So we know all search engines aim at finding you the best possible results for whatever you’re searching for.
Consider a practical example: Imagine going to the library. You’re there to find all the books with funny images of cats doing silly stuff, for example. Everybody loves silly cats! Now, where would you start looking if there was no directory or index showing you the way to the right shelf at the library with all those cute cat books?
A search engine can help you out! To do this, it needs to start by gathering information about what the Web holds (1).
Next, it uses all this information to build a directory for the Web, or an index (2).
And finally, it puts the items in the index in order by relevance (3).
Let’s take a look at each of these steps in more detail!
1: Crawling – following links and finding content
First off, a search engine has to know what the Web holds in the first place. It needs to gather information about billions of websites and their contents. This is done by defining so-called seed pages, where the search engine starts looking at what the page is all about and following all the links on the page. Then, the same is done for the pages found behind the links on the first page.
All this data from crawling the Web is returned to the servers of the search engine. This process goes on and on for as long as the engine wishes.
This process is called crawling. The program doing the job is called a crawler, robot, or spider. Its sole task is to follow all the links on a given page and visit new pages. These pages and their contents, including the cats you’re looking for, are then added to the index, which we’ll take a look at next.
2: Indexing – building a directory for the content
With pages and their contents crawled, they all are organized into a directory. This called indexing. The index is simply a gigantic list of all the pages and all the content the crawler finds.
The search engine uses the index to build the search result page for you. So when you’re looking for funny cat photos, the search engine doesn’t return pages with cake recipes on them.
However, not all web pages end up in the index! For example, if Google finds several pages that have exactly the same content – i.e. are copies of each other, they are omitted from the indexing. Why? Imagine you want to find a special kind of product. Now, if all resellers use the exact same product description provided by the manufacturer, Google will decide which of them it includes in the index. There could be hundreds or thousands of resellers, and it’s better if you don’t have to go through all of them manually, right?
3: Ranking – finding the most relevant search results
Finally, and most importantly, search engines need to determine how to return the best search results first.
Let’s imagine the search engine has a result for your search. It has found 150 million suitable search results. Yikes! Somehow the engine has to go through all the results and sort them so that you will see the most relevant ones first.
This process is called ranking, and it’s ultimately what makes all the difference between search engines. In fact, Google’s ranking method PageRank made it the biggest player in the market back in the day.
Nowadays, each engine has developed complex algorithms for pairing the user – that’s you – with the best cat photo websites in the world!
3. How to build your own search engine
For anyone interested in programming, building a search engine from scratch is great practice.
Some time ago I found a great course on Udacity aimed at Python beginners. It takes you through some really cool fundamental programming concepts, so it’s great for newbies. Or pretty much for anyone interested in coding in general.
It’s free, so go ahead and check it out!
The main project in the course is to actually write the code for a basic search engine using Python. That’s how I first got to learn more about the subject. It also introduced me to a variety of computer science concepts in general at the same time. That’s actually what inspired me to write this post in the first place!
Ready, set, search!
So what happens when you click “Search” then? The ranking algorithms put the index through a proper third-degree. They ask the pages in the index dozens and hundreds of questions in order to find the most relevant results. Just for you.
And think about it, this all happens in the blink of an eye!
Just thinking about the amount of data being processed by search engines makes me dizzy. While I’m writing this, Google executes 58,310 searches every second. Every second!
As whole, search engines have gone through quite an evolution in the last 20 years or so. They are incredibly massive players that are both extremely complex and delicate.
But no matter which one we’re talking about, they all have one goal: to find exactly what you’re looking for.
— See also: 4 Steps to Get You Started With Coding
Thanks for reading! I hope you found the best cat photos out there. Please, feel free to share your thoughts in the comments below. And if you liked it, go ahead and share this post with others!