Improbable events, like Leicester City's recent Premier League win, break our brains, so statisticians and sports fans alike need to be careful.
It’s 2016 and Elvis is alive, shambling forth from some deep Mississippian backwater as a forming crowd photographs the octogenarian in his rotting jumpsuit. Across the Atlantic, an even more ancient creature, some kind of Mesozoic monster, hauls its hissing bulk from a loch in Scotland. Making his international cricket debut for England at number four, President Barack Obama drives Ishant Sharma defensively through the covers for two. And Leicester City F.C., favourites for the wooden spoon, win the English Premier League with two games to spare.
That the above were all almost unthinkable this time last year there is no doubt – each scenario was given the same 5000-1 odds by William Hill bookmakers in 2015, and betting agencies don’t tend to offer longer odds than that for fear of having a massive payout to make. But that fear has now become reality with Leicester City’s recent premiership win. Bookmakers are expected to pay close to £10m to the punters devoted or reckless enough to bet on Leicester City at the start of the season, but it’s hard to fault them for setting the odds so long. This is, after all, the single greatest upset in the history of sport.
To really comprehend the unexpectedness of Leicester City’s triumph, consider it with reference to the well-worn underdog tale from antiquity — that of David and the giant Goliath. Here we have a duel wherein David has a sling and stones to Goliath’s javelin, and must disable or kill Goliath with his first shot or be swiftly stabbed to death. This is a difficult task, but David is a skilled slinger, and so we can see David’s victory as improbable but not implausible. Assuming we were to know the full capabilities of the combatants before the duel began, we could imagine probabilities representing each competitor’s chance of success. For David, the chance of success would be low, say 25%, which in turn gives Goliath a 75% chance of victory. Of course, in Biblical lore David won the fight, but as a one-off event this victory is fairly unremarkable, 2,000 years of Christian idolisation aside.
Now imagine we were to travel to a distant galaxy and encounter an alien race that had been observing humanity from afar, and by some strange galactic peculiarity had found our story of David and Goliath so fascinating that they decided to recreate the duel themselves, so that they could watch it on repeat in perpetuity. For this they have engineered a quintessential David and a quintessential Goliath from human cells, and then produced millions of identical clones of each and raised the clones to fight as gladiators, one-on-one, in their otherworldly colosseum.
Not wishing to offend our hosts, we take our seat at the arena’s edge and observe a series of bloody duels: David’s stone misses Goliath, who grabs David’s wrist and with his javelin splits David’s head like a melon; the second duel begins, this time the stone strikes Goliath’s forehead and the giant collapses in a heap; the third duel begins, the stone strikes Goliath meekly in his chest, he runs David through. For any single duel, one of these clonal Davids may have a 100% success rate (wherein he defeats Goliath). But once we have witnessed many duels in succession, the proportion of duels won by David would end up close to 25%. It is not too unlikely that we see David win every time if we witness only two duels (0.252 = 6.3% chance), but for David to win 10 duels out of 10 is much less likely, only occurring once every 1,048,576 sets of 10 fights (0.2510 = 0.000095% chance).
Now we turn to Leicester City F.C., not as the league champions they are now, but as they were perceived at the beginning of the 2015/2016 season. Reading predictions by sportswriters the world over, Leicester City were almost unanimously expected to end up around the bottom of the ladder, and be relegated to a lower division. Looking at the bottom of the ladder now we find Aston Villa F.C., who managed just three wins to their 27 losses. If we ignore draws, which generally occur at the same frequency from the top to the bottom of the ladder, we can see that Aston Villa managed to lose nine times more games than they won. They performed far worse than our clonal Davids, who were only three times more likely to lose than to win. As Aston Villa’s performance was particularly poor, we can look at the average win/loss ratio of a relegated team over the past few years, which works out to around three losses for every win, the same win/loss rate as David’s against Goliath. Thus we can estimate our likelihood that this expected-to-be-relegated Leicester City wins a non-drawn game against a premier league opponent: 25%.
Of course, this Leicester City imagined last year by sportswriters and bookmakers is worlds apart from the team that just won the title. But how far apart is the expectation from the reality? Here we can use the predicted win rate of the imagined Leicester City to investigate how likely it would be that such a second-rate team would end up with the same or as many wins as the real Leicester City managed this season. Of their games not resulting draws, Leicester City won 23 and lost only three. For a team with a 25% chance of winning, winning 23 games or more out of 26 would happen only once every 61,521,223,258 premier league seasons. It would have been a safer bet that the Sun would envelop the Earth before this happening, as this is expected to happen after only 7,500,000,000 premier league seasons, though claiming one’s winnings afterward could prove more difficult.
Scientists of all sorts, from geneticists to meteorologists to particle physicists, use statistical inference like the above to investigate patterns in the natural world. To show how this works, we can return to the alien planet from before. We are observing yet another round of the endless combat between David and Goliath when some members of a new alien race, from an even more distant galaxy, make their way to the front of the arena and take their seats on the terrace next to ours. These aliens have no knowledge of humans and their fighting styles, and as such are watching the games with no preconceived idea of which of the two parties, David or Goliath, is more likely to emerge victorious. They are the scientists of our story, dispassionate observers of an unfamiliar but knowable nature.
After several rounds of combat, they begin to develop a hypothesis amongst themselves – the larger human, Goliath, seems to be the stronger competitor, as he has won more bouts than the other. But how can they know for sure that this is the case, and not just the result of random chance? A tossed coin has no greater likelihood of a head than a tail, yet we expect to see occasional runs of 5 or more heads in a row – could they simply be witnessing a string of victories going to one class of duelist against his equally capable opponent?
To solve this, scientists invoke statistical confidence. This refers to the confidence they have that their observations reflect a genuine underlying system (the hypothesis) rather than simply the result of random chance (the null hypothesis). This is called null hypothesis significance testing (NHST). NHST was developed by Ronald Fisher in 1925, and has since been the most commonly employed method of scientific data analysis. In the simple system of our alien arena, the two main factors that determine the statistical confidence of our predictions are sample size (the number of duels) and how big the difference is between the two treatments (Goliath’s advantage over David). In this case, Goliath’s advantage is immense, and as such, the newcomer aliens will not need to observe many bouts between the two before establishing their confidence that Goliath is the better competitor of the two. After 50 rounds of combat, it would be very unlikely that these rational aliens would not reach this conclusion, as this would only occur if, in the 50 duels they witnessed, the clonal Davids managed to win many more bouts than they would on average. However, if the competitors were more evenly matched, and Goliath only had a 55% chance of winning, it would require the aliens to sit and watch for much longer before they could feel confident that he was the stronger of the two.
These simple principles of statistical reasoning are used in scientific experimental design. A medical researcher might compare two treatments, one of which will involve a placebo and the other the drug or therapy under study. However, unlike our alien observers, they will not have access to a limitless number of study replicates — in this case, willing participants who suffer from the illness they are researching. In the case of rare illnesses, sample sizes may be so small that statistical power is almost completely eroded, and only a treatment that produces a very strong effect in patients will be detected as statistically significant.
Admittedly, this concept of statistical significance is rooted rather arbitrarily. The typical cutoff point for significance is the confidence that there is less than a 5% chance of a false positive, but this 5% value is reflective of no deep mathematical law. Moreover, as Goodhart’s Law states, when a measure becomes a target, it ceases to be a good measure. This has disconcerted some scientists enough for them to suggest abandoning NHST in favour of new systems.
Much has been written and broadcast in recent years regarding p-hacking, the manipulation of statistical tests and study design to coerce the results into statistical significance. But even well-meaning researchers can end up with false positives if their sample size is low. Through the popular media, the lay public is bombarded with results from so many studies giving contradictory advice, often regarding nutrition and health, that they can simply pick and choose which ones to believe based on what suits them. This leads to distrust and a devaluing of science as a means by which we can genuinely improve societies.
Returning to Leicester City and the English Premier League, we can see the league itself as a test for how good each of the teams are. Each team has 38 games to argue its case, a sample size that for some is more than enough to establish a winner, but which others might find too small – some disgruntled fans of many of the other, richer teams feel their team was actually better than Leicester City, but just didn’t get to prove it. But the Premier League is no science. No amount of debate will take away Leicester City’s league title and the astonishingly improbable marvel it is.
Edited by Jack Scanlan.