## All models are wrong (or: lies, damned lies and statistics)*

November 10, 2010

The statistician is seen with a certain amount of disdain (or possibly sympathy) by their pure mathematical brethren. And it is with that firmly in mind that I (as a fledgling statistician) take the reins of this worthy blog.

We have some idea of what mathematics is from Adam’s posts; but what is statistics?  Statistics is applied maths with uncertainty. In statistics mathematical techniques are used to model and quantify our uncertainty about reality. Modelling climate change, predicting the outcome of elections, wrecking the financial system and ensuring the casino always wins: statistics is everywhere. And uncertainty is the key to statistics.

In order to get across an understanding of what uncertainty is I will try to describe some of the different kinds we face and how statistics deals with them.  The five levels in the following taxonomy lie on a continuum running from complete certainty to complete uncertainty, and provide a means of measuring the range and limitations of statistics in different situations.** The further we go along this continuum the less effective statistics is at prediction and inference, and many problems in statistics and quantitative social sciences like economics come from not recognising just how far along the continuum we are.

In his Guardian column last Saturday, after having unleashed the full extent of his fury at some poor unsuspecting tabloid for getting a statistic slightly wrong (don’t get me wrong, the media needs more people like him) Ben Goldacre mentions in passing that if there are at least 23 people in a room, the probability that 2 of them will have the same birthday is over 50%.  This is known as “the birthday paradox”, and while not technically being an actual paradox, it is highly counterintuitive, as probabilistic results often tend to be.  The counterintuitivity comes from the fact that people tend to assume the question is: “if I am in a room with some people, what is the probability of someone having the same birthday as me?” If there are 22 other people, then this gives only 22 possibilities.  But if we don’t specify the actual birth-date, the number of pairs of people, and hence the number of possible birthday-matches, becomes$23\choose 2$(the number of ways of choosing 2 things from 23 things), which is 253.  It is quite easy to believe that there is some likelihood of one of these pairs of people having the same birthday.