This post is inspired by a friend of mine, who very late one night recently made a valiant attempt (given the circumstances) to explain to me how the google website-ranking system works. I was surprised to hear that at its heart it is a simple – although quite clever – application of linear algebra. A couple of days later, in one of those funny occurrences of suddenly encountering the same concept/thing multiple times after having spent a lifetime never hearing about it, I actually came across a very similar piece of theory in my research, and read a bit more about it. So I thought I’d explain it for the interested among you. However, I am swiftly learning that attempting to write an entire introduction to a mathematical subject in one blog post is a foolish thing to do! Usually what seems to happen is that I spend the entire post writing about some fundamental aspect of the subject, and then give up and explain the rest very quickly and inadequately. So I’m afraid I’m going to have to assume you know basic linear algebra…if not, then you might want to stop reading now.
Now, other than indexing websites and providing a portal through which to access them, clearly the most crucial aspect of a search engine is the ordering system it uses to list the sites. There needs to be some way of assigning an “importance score” to each webpage, such that the ones which people are most likely to want view come first. Arguably the sole reason google are as successful as they are is a very effective method of doing this invented by Larry Page while he was at university, conveniently called PageRank. The system uses the links to a page to determine its score, and crucially, it measures not just the number of these links but their “quality”; that is, it assigns higher importance to links coming from pages which themselves have a high score.