Queenie at the Circus

‘I see data as a tool for making decisions’: scientist Simon Maskell on tackling Covid-19

Image credit: Getty Images

Simon Maskell, a data scientist at the University of Liverpool, discusses his current work on a data system to support the UK government’s response to Covid-19.

When it comes to the way in which Covid-19 mortality data is presented in the media – or any other data related to spread of infection, transport usage or international comparisons – the trouble is that “while you are looking at very clear numbers, it’s not exactly clear what those numbers mean”.

These are the words of Simon Maskell, data scientist at the University of Liverpool or, to give him his full job title - which he admits is “not particularly snappy” - Professor of Autonomous Systems in the School of Electrical Engineering, Electronics and Computer Science. Whichever way you choose to describe what he does, he’s an acknowledged academic expert in the field of data science. Only don’t call him ‘professor’ or ‘doctor’. “Simon will do. I don’t go in for all that hierarchical management stuff.”

“I’d be much happier,” he says, “if we said that 543.7 people was our best estimate of how many people had died. Of course, I’d be even happier if the best estimate was zero.” The point he’s making is that, as a data scientist, he doesn’t want to see figures rounded up to integers because they create a false sense of the data being presented as accurate.

“There’s always error. However, what would the public think if the government presented figures on the screen and said they didn’t know how many people died in hospital that day?” If you don’t know, says Maskell, you should be “more transparent” and say you don’t know, “instead of putting up these crazy graphs that say this is precisely the number of people who died in hospital and this is our ‘best fit’ line”.

On the other hand, “it’s pretty difficult to convey what’s really a probability distribution. I’d prefer to see a ‘heat-map’ that appeared to be smudged up and down, where the extent of the smudge would convey the extent of the uncertainty.” The problem, as Maskell explains, is that uncertainty is perceived politically as weakness, while for the mathematician, “it’s just reality”.

Both the public and politicians, continues Maskell, want figures that look decisive. “The nice thing about being decisive is you might be right. So, if you’re lucky and you’re right, you’re going to look great, but I’d prefer decisions to be made based on the data. As a data scientist, I live and breathe uncertainty. What we are being presented as fact isn’t fact. It’s belief and we should stop presenting belief and perception as fact, because it isn’t.”

Now, Maskell’s work at Liverpool is focusing his data science research efforts on supporting the UK government’s response to Covid-19. He’s developing a data system that will bring together data from disparate sources to improve understanding of the pandemic. Maskell’s short-term aim is to come up with a data system that can be used to provide, for example, daily forecasts for ICU (intensive care unit) bed demand for each NHS region, or that assists decision-makers to agree on appropriate timings for adjustment of social-distancing measures by assessing societal, health and economic impacts. The idea is to present policymakers with a data dashboard that can feed into the process of understanding what the ‘new normal’ might look like.

The sort of data used includes “quite reliable data” on deaths in hospitals from the Office of National Statistics, “noisy data” on hospital admissions, “even noisier data”’ from social media and “even noisier data than that” from apps.

“We want to bring that all together with data regarding the movement of people around the UK, to model the spread of the virus in the population and the response of interventions.” An example of ‘response of intervention’ might be, “if we came out of lockdown tomorrow, what might happen? If I give people a tracking app, what would happen?... ‘what if’ questions about the future.”

Yet, says Maskell, the importance of the project goes beyond simply combining data. “It is also about being able to make better assumptions. The models might use, for example, census data from 2011 on how people commute to work. Given that everybody is working from home at the moment that is a pretty stupid thing to use, if you have a choice. If you could instead use data on where people went yesterday, because the Department for Transport had been talking to individuals who could provide that information, that would provide a much better feed than something that’s nine years old and from a reality that simply isn’t now.”

The reason data is so important, says Maskell, is “people believe it to be objective”. When we listen to the five o’clock government briefings, “a word we will hear a lot is ‘model’”. While that might sound objective, it is also one that is bandied about in a quasi-scientific way to inspire faith in the data that follows. There is an inherent need to treat such statements with caution, because “models are just a concatenation of various assumptions the creator of the model might consider to be important, and so therefore can be quite subjective. A model will have things included by design, things that are in there by accident, and things that might be very important.

"However, it is my experience that it is very difficult to know which is which. Models aren’t transparent and fair. They can be politically loaded before they are put before people, but data is data and a fair measuring stick, particularly when you are trying to do things like predict the future. Which is pretty much the world we are living in at the moment.”

One characteristic of this world is that “we are living in an emergency. This means we need to make different judgment calls, which retrospectively may turn out to have been not the right thing to do. I think it’s perfectly OK when responding to a crisis to use the best data you can find when you first look. Yet I think we’re getting to the point with Covid-19 now where we’re starting to realise that this is a marathon and not a sprint, which means we can afford to take a slightly more considered and constructive view of what’s going on.”

‘We should stop presenting belief and perception as fact, because it isn’t.’

Simon Maskell, University of Liverpool

Realising you can pause to catch your breath and reassess your position, perhaps counterintuitively comes with risk attached in that “as our knowledge of a situation improves, we may wish to change our approach to it”, which can appear as uncertainty. This creates a public perception problem, because “for politicians, changing your mind is a really awkward thing to do”, and carries a political cost. From the engineer’s perspective, you make the decision based on information available to you at the time. Then when you get more data, you might come to realise you made the wrong decision,” and course-correct.

While in theory, at least, this means the overall response would become more appropriate to requirements of the emergency, it isn’t always politically expedient to change direction, “and that’s what spurs me on to want to provide the best infrastructure to support a more considered and objective view. I think that’s where data science can really add something.”

Maskell is an engineer by background and comes to academia from a career in industry, the UK Ministry of Defence and other security services. “I thought I wanted to become a mechanical engineer and so I did a year with the R&D department of a high-​tech company of sorts, manufacturing stairlifts.” The result was that Maskell became less certain that he wanted to be a mechanical engineer and went to the University of Cambridge on an IET scholarship, where the most important lesson he learned was that “you could get a job with computers. That realisation was amazing. I’d been taking them apart, rebuilding them and programming them for years, but it never dawned on me that you could actually do that as a job.”

Maskell describes engineering as “home”, which means “in terms of what I do today, while some would describe it as computer science or statistics, I would call it engineering, and my engineering genes lead me to want to solve problems”. One reason he’s come to accept the shortcomings in the way in which Covid-19-related data is presented to the public is that “I don’t see data as a quest for truth. I see it as a tool for making decisions. You can probably make a good decision sooner than you can know the truth.” In the grip of a public health crisis, “time is life. Making quick decisions, even when there is ambiguity present is, for me, really important. That interface between engineering, computer science and statistics is what people refer to as data science. The reason I feel twitchy about calling data science just that, is because it isn’t a quest for truth. I’d like it if there was a different name for it, but I think we’re stuck with it.”

Having reached a philosophical conclusion on the relationship between truth and data, Maskell ponders on what this could mean in reality. There then follows a lengthy disquisition on strategies for solving big data problems by chopping them up and spreading them around. Essentially, says Maskell, one way of looking at this might be to say that if you could imagine a problem that could be divided into 100 parts that were independent of one another, you’d be better off getting 100 people to each address one part of that problem than you would be getting one person, no matter how expert, to address the whole lot.

Not only would you reach your answer faster, but you’d be more likely to reach a ‘better’ answer (i.e. the sole solver could be wrong, but it’s unlikely that all 100 solvers working on component parts of the problem would all be wrong at the same time). However, you can’t look at problems this way using “the computer you have on your desk in front of you. You need a big computer with tens of thousands of graphics cards. Wouldn’t it be a good idea to use computers like this to solve problems not by spreading the data, but the uncertainty over the computer? You describe the uncertainty in a sequence of hypotheses.”

What this means in layman’s terms is that each of the 100 solvers “says ‘this is what I think might be going on’, goes away, analyses the data and comes back saying ‘this is what I found out in the context of my particular enquiry’. The great thing is that the team outperforms the single person working on the problem. One hundred people working for an hour each is better than one person working for 100 hours. That really is quite profound. It’s a bit like the wisdom of crowds idea, though it’s more like the safety net of teamwork.”

At this point, Maskell says that if we can massively increase the computer power we can throw at the extant Covid-19 data, the benefits start to mount up. “If you have a model from Imperial College that runs in 25 minutes on four cores, but you can then run it over thousands of graphics processing units, then you can be so much more ambitious in terms of the assumptions you pack in and assumptions you can test. What this means is that you are no longer constrained by choice. You can let the data choose for you, and the data is, at least in theory, fair. The data can decide whether to trust in a model from Finland, Germany, Cambridge or wherever.”

At this point, I ask the 44-year-old British engineer to step back from the granularity of data science and sum up in a few words what this all means for a government and public anxious to have better information about the uncertain world they are living in.

“Oh, I see,” says Maskell. “It speeds up predicting the future. And not just a bit. It’s tens of thousands of times better. You can speed up – dramatically – the ability to analyse future government policy decisions. But just speeding up the same analysis is about as useful as a chocolate teapot. However, speeding up analysis means you can be more ambitious in terms of models and data sources you use, so you can reconcile the apparent conflict between all the different conclusions that are drawn. Crucially, this can be achieved by the data rather than through political debate.”

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles