Avalanche of data

Book review: ‘The Art of Statistics: Learning from Data’ by David Spiegelhalter

Image credit: Dreamstime

A lesson in how to distinguish the good numbers from the bad and the ugly.

For an example of how numbers can be used to mislead and lead a country into chaos, one need look no further than Vote Leave’s Brexit campaign and its notorious red bus carrying the message: “We send the EU £350 million a week; let’s fund the NHS instead”. The UK Office of National Statistics later judged that the claim was "misleading and undermined trust in official statistics," but it nevertheless helped to muddle voter trust as part of a number-driven disinformation campaign whose consequences reverberate to this day.

With ‘The Art of Statistics: Learning from Data’ (Pelican, £16.99, ISBN 9780241398630), British statistician David Spiegelhalter comes to the rescue in an attempt to head off similar ‘number abuse’ in future lobbying exercises.

Spiegelhalter, chair of the Winton Centre of Risk and Evidence Communication in the Statistical Laboratory at the University of Cambridge, aims to help non-statisticians gain trust in their own abilities to investigate data by teaching how sound statistics really works and how readers can tell the good from the bad and the ugly. His mission is to teach laypeople - especially those who may have despised statistics back in school - how to rekindle their fervour for the martial arts to analyse and communicate data.

Although intended for a general readership, the result doesn’t just scratch the surface of the subject. Despite dodging the math-bullets as far as possible, Spiegelhalter descends the rabbit hole to a remarkable depth. All too aware from teaching statistics at university that it can be dull, he insists that it is more than “esoteric formula with which generations of students have struggled”. His novel technique for avoiding boredom on the path to becoming a data detective is the Problem, Plan, Data, Analysis, Conclusion, or PPDAC, problem solving circle. A new and fresh idea that both students and teachers can pick up, it “underscores that formal techniques for statistical analysis play only one part of a statistician and data scientist”.

Like the fictional investigator Sherlock Holmes, Spiegelhalter takes readers on a trail to challenge methodology and stats thrown at us by the media and others. But where other authors have attempted this and failed, he is inventive and clever in picking the right examples that spark the reader's interest to become active on their own. The appeal of his cases lies in the fact that they are based on questions that occupy most of us, such as the reported numbers on lifetime opposite sex partners by men and women - and why neither can be completely trusted in their statements.

He also reminds us that national newspapers and journalists need to up their game in reporting numbers and techniques in visual communication of charts and data. The book is in part an appeal to editors to question and thoroughly check before stories go out that feature data, especially if figures quoted appear too good - or bad - to be true.

Spiegelhalter challenges the idea that math comes first, followed much later by the computer-aided calculations that help statisticians accomplish much of the heavy lifting in analysing data. Tools that most of us have access to - a computer, free open-source software such as R, creativity and a good portion of inquisitiveness - are sufficient to go on and investigate whether rules and biases are consistently heeded.

Case-studies encourage readers to arm themselves with these tools and knowledge - perhaps in the form of a simple decision-tree model - and swap their Netflix binge for an evening at their laptop working out “the luckiest passenger on the Titanic”.

Or there’s the way in which the legal system uses statistics in investigations to help a court reach its final judgment. Spiegelhalter goes one step further and suggests how much sooner - roughly 15 years, he estimates - the gruesome chain of hundreds of murders by Harold Shipman, an English general practitioner and one of the most prolific serial killers in history, could have been uncovered and the culprit caught if modest statistical inference had been used to analyse data about patient deaths.

‘The Art of Statistics’ addresses the reason why understanding statistics is quintessential to modern day life - because numbers rarely speak for themselves. It’s as much about how to avoid being fooled as it is about how to understand numbers.

Adopting Spiegelhalter's approach, I felt compelled to apply similar rigour to judging the book with the help of data. Review rating data accessible freely on Goodreads, a social cataloguing website for books, were obtained for 1,997 books with the word 'statistics' in the title. With an average rating of 4 (“really liked it”), Spiegelhalter's book is seemingly hitting the right notes for readers. But, as the very content of his book teaches us, we need to be cautious and check on the raw data and sample size.

Infographic: Statistical analysis of statistics books

Image credit: Ben Heubl/E&T

With just two ratings given at the time of writing in early April 2019 however, it needs time before its average rating can reliably be used in judging how it compares with other statistics books... Spiegelhalter himself would perhaps agree with a pinch of caution before drawing anyrapid conclusions.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles