## How google does what it does

October 29, 2010

This post is inspired by a friend of mine, who very late one night recently made a valiant attempt (given the circumstances) to explain to me how the google website-ranking system works.  I was surprised to hear that at its heart it is a simple – although quite clever – application of linear algebra.  A couple of days later, in one of those funny occurrences of suddenly encountering the same concept/thing multiple times after having spent a lifetime never hearing about it, I actually came across a very similar piece of theory in my research, and read a bit more about it.  So I thought I’d explain it for the interested among you.  However, I am swiftly learning that attempting to write an entire introduction to a mathematical subject in one blog post is a foolish thing to do!  Usually what seems to happen is that I spend the entire post writing about some fundamental aspect of the subject, and then give up and explain the rest very quickly and inadequately.  So I’m afraid I’m going to have to assume you know basic linear algebra…if not, then you might want to stop reading now.

Now, other than indexing websites and providing a portal through which to access them, clearly the most crucial aspect of a search engine is the ordering system it uses to list the sites.  There needs to be some way of assigning an “importance score” to each webpage, such that the ones which people are most likely to want view come first.  Arguably the sole reason google are as successful as they are is a very effective method of doing this invented by Larry Page while he was at university, conveniently called PageRank.  The system uses the links to a page to determine its score, and crucially, it measures not just the number of these links but their “quality”;  that is, it assigns higher importance to links coming from pages which themselves have a high score.

## A year’s work, lessons learnt

October 11, 2010

I’m back!  And rather surprisingly, I seem to have gained a lot of readers in my absence.   Having not even logged on to WordPress for a few months, I have returned to see that my google reader subscription rate has doubled, and the number of people visiting the blog has increased by more than at any point since I started writing it.  I’m not really sure what lesson to take from this.  Probably it is just the natural result of a gradual snowballing effect: over time more people click on your site, the google rankings go up, causing more people to click on your site…

Then again it’s  possible that people just prefer it when I don’t write anything!  Well I’m sorry those people, but I intend to start again.  Although possibly even more erratically than before.

Anyway, I will explain the terrible sequence of events which led me to abandon blog shortly, but first, a shameless Rupert Murdoch-style using of one of my products to promote another! (I would do this at the end, but am rather doubtful as to how many people actually make it to the end of my posts).  Having been introduced to Vietnamese coffee by my father-in-law a while ago, I have utterly fallen in love with it, and realised that it is very difficult to find here in the UK.  So I have set up a (very) small business selling it.  The website is here.  Try it!  You won’t be disappointed.

Sorry about that.  Now, this is what happened.  Having struggled with the proof of a knotty mathematical problem for the better part of a year, I was advised by my supervisor to publish what I had.  So I put the paper on the arXiv (an online preprint archive), not really expecting to achieve anything, but generally wanting to share the knowledge out of a spirit of altruism (and self-promotion).  Within a few hours of it appearing, a certain Peter Mueller had read it, and proved the last part of the conjecture!  (So there you go doubters: people do read your preprints).  This was wonderful news; we invited him to be a co-author, and set about writing a final draft. I also wrote a whole long blog post about how great this was, and what it all meant.  But then a couple of days later Prof. Mueller sent me some rather less good news: he had found a  mistake in my work, which completely invalidated the whole thing…

## A bunch of people, in a room

February 16, 2010

Partly in a bid to keep the interest of the small band of readers I appear to have gained since my last post, and partly out of sheer laziness, I am again going to dispense with serious mathematics this week, and instead discuss what interesting things can be said about: some people, in a room.

In his Guardian column last Saturday, after having unleashed the full extent of his fury at some poor unsuspecting tabloid for getting a statistic slightly wrong (don’t get me wrong, the media needs more people like him) Ben Goldacre mentions in passing that if there are at least 23 people in a room, the probability that 2 of them will have the same birthday is over 50%.  This is known as “the birthday paradox”, and while not technically being an actual paradox, it is highly counterintuitive, as probabilistic results often tend to be.  The counterintuitivity comes from the fact that people tend to assume the question is: “if I am in a room with some people, what is the probability of someone having the same birthday as me?” If there are 22 other people, then this gives only 22 possibilities.  But if we don’t specify the actual birth-date, the number of pairs of people, and hence the number of possible birthday-matches, becomes$23\choose 2$(the number of ways of choosing 2 things from 23 things), which is 253.  It is quite easy to believe that there is some likelihood of one of these pairs of people having the same birthday.