MRI scans from a dementia patient

Interview with John Gallacher, director of Dementias Platform UK

Image credit: Dreamstime

Professor John Gallacher explains how Dementias Platform UK (DPUK) built the world's largest secure data repository and how this model could be re-purposed to accelerate medical research.

DPUK is a public-private partnership established in 2014 to accelerate dementias research to achieve earlier detection, better treatment and eventually prevention by facilitating sharing and coordination between researchers. At the heart of the project is the world’s largest secure data repository: an electronic data portal bringing together data from two million study participants. DPUK is directed by Professor John Gallacher of the University of Oxford’s Department of Psychiatry, funded by the MRC and involves 11 UK universities and several industry partners.

At the MQ Data Science Meeting 2018 in South London, E&T sat down with Gallacher to discuss how DPUK’s enormous data portal works, why it could prove so valuable and how DPUK’s models could be re-purposed for research into other conditions.

Why is this an important time for dementia research?

I think dementia has, at long last, been recognised as a time bomb. If you’re reducing mortality from infectious disease, if you’re reducing mortality from the common chronic diseases, you have more people getting older and as you have more people getting older, you also have more people getting dementia. It’s a very simple equation.

We don’t have any disease modifying treatments for dementia. We can treat the symptoms successfully to an extent – we can improve quality of life for both patients and carers through various policies – but that’s not the point; we don’t have a disease-modifying therapy, so it's critical that we find solutions to this problem.

Why is dementia such a challenging group of diseases to study?

It’s really due to the complexity of the brain and everything that follows on from that. For example, you can open up the chest and look at the heart, [but] you don’t get much of a view of the brain when you open up the skull, so it’s very hard to know what’s going on. Also, it’s the body’s most complex organ by far and so even if you can have a look there are so many unknowns associated with how the brain functions that we just don’t know where to look. We’re getting better at that – to some extent dementia is a systemic condition, to some extent it’s a neuropathological condition, to some extent it’s a vascular condition – but all these things come together in the brain and it’s very, very difficult to untangle.

Could you outline DPUK’s missions?

We have three utilities. One is to foster rapid data access and we do that through our data portal. The second is to be able to recruit highly targeted individuals – well-described individuals – for trials. The third is to enable multi-centre trials which bring together industrial and academic partners. Many other things come out of that, but those are our core missions.

What was the motivation for building DPUK’s data portal?

Traditionally, epidemiologists would put together large data sets to do large analyses for ten years, but they typically do it within a very constrained consortium environment so only the consortium has access to it and it’s only analysed in one place, which is great. Actually what we’d like to do is democratise the whole activity by bringing the data into one place, then any registered scientist can apply for access to it. So you don’t be the University of Oxford or [Massachusetts Institute of Technology] or any other big high-end [institution], you just need a good idea and then you become a registered user and you can access – but not download – the data from wherever you are in the world with a connection. I think that is fantastic.

What are the benefits of having such a large dataset?

Emerging research questions are very complicated and they need larger datasets, so all of a sudden scientists are going: “Well, where are these large datasets?” and of course we can provide that.

When you have more data you get more precise [results], but you don’t always need that level of precision. What the data portal allows you to do is to apply for multiple datasets in one go, which is much easier and it allows you to triangulate between datasets. So you might have a finding in one and then ask if that’s confirmed in another; if it is, you might be onto something. If you can confirm a finding in self-reported data, in imaging data and in genetic data, you are really onto a genuine signal rather than a chance effect.

Why didn’t a data portal like this exist previously?

I think there are two reasons: simple computing power and technical solutions to maintain the security and non-identifiability of subjects - and we’ve solved that.

How do you keep participants’ data anonymous?

[In other health databases] you can look at some data: this person had a heart attack two years ago, had a stroke three years ago, lives in this place, this age, so you can say, “Oh, I know who it is”. That must not be allowed to happen, so what you need is technical solutions to prevent that happening and they have to be bulletproof.

First of all, you separate out what you might describe as personal demographic data and then the keys that link different data together are held by a third party – an independent third party – so no research can link the two things. Because there’s always a sort of identifiability by subtraction issue, you just make sure that things like cell frequencies are obscured, things like that.

How do you hope the data portal and other DPUK services will be used?

Our role is not to prescribe what should be done, which is frustrating because there are many things I’d like to see done! We would love to see studies looking at co-morbidities between dementia, heart disease, stroke etc and we would love to see studies looking at the care pathway so we can influence policy.

When it comes to machine learning, that involves an enormous amount of compute power. We would have to be more planned about that so that we have sufficient machine capacity for a study, otherwise it would turn the lights out in Swansea or something like that, but for standard epidemiological applications, they can just run them.

Could DPUK’s models be used to organise study data for other conditions?

The infrastructure and architecture and principles are identical: that’s not the issue. The issue is that collecting psychiatric community-based data has always been a challenge because of stigma and funding. I think that is changing: I think the general public is much more willing to provide those data and I think funders are much more willing to resource it.

The key thing for me [is that] we have to be very clear about how we take advantage of this opportunity, because it won’t come again. So my view is that we use DPUK and repurpose it for mental health; we bring in mental health cohorts [study groups in which participants share characteristics] to beef up the data, but really what we’re doing is wanting to lay a platform for a population mental health cohort and that may well be with adolescents […] if you can do it for adolescents you can do it in adults, whereas if you can do it in adults, you don’t necessarily have the ability to do it in adolescents.

I would argue strongly for the MRC and whoever else to [support an] adolescent mental health cohort which over a period of 10, 15, 20 years will more than return investment in terms of scientific findings and generating further research.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles