A new machine learning radiomics model by AI company Zegami uses images of infected lungs to help radiologists more quickly identify coronavirus cases.

Covid-19 versus genomics and other advanced technologies

Image credit: Zegami

Genome sequencing, big data and artificial intelligence are helping doctors to better understand, treat and hopefully beat Covid-19.

The global scientific response to the novel coronavirus pandemic, which so far has killed over 328,000 people worldwide, is unprecedented. On 10 January 2020, nine days after the first cases of suspected Covid-19 were identified, the first genome sequence of the virus was shared publicly. Since then, tens of thousands of samples have been sequenced.

Genomics, which is concerned with the genetic material of an organism, is one of the most promising areas of research for Covid-19. By unlocking the virus’ genetic code and that of the most severely affected hosts – the patients – experts hope to better inform public health decisions and find effective treatments.

Working to this end is the £20m UK government-funded Cog UK research consortium, which consolidates the resources of the highly regarded Sanger Institute, the NHS and leading universities. The alliance has already sequenced over 16,000 viral samples from patients with confirmed cases of Covid-19. 

Detailed analysis of sequenced viral samples of Covid-19 can identify small changes in the virus as it passes through the population which can then be used to track its spread.

“As a virus replicates itself in different hosts, it accumulates small ‘typos’ in its code called mutations. While the vast majority of mutations are not functional, by identifying them in different viral samples we can track and trace the infections’ spread locally and from one to another,” explains Emma Hodcroft, a post-doctoral researcher at the University of Basel in Switzerland.

“If two samples have the same typos, it means they probably come from a parent virus that also has these typos, and so can be identified as more closely related or from the same infection chain,” she adds.

Hodcroft is currently working on Nextstrain, a SARS-CoV-2 open-source project that provides a continually updated view of publicly available genome-sequencing data, alongside analytic and visualisation tools. From across the globe, nearly 20,000 sequences have been uploaded to the Global Initiative on Sharing All Influenza Data (GISAID), including some from Cog-UK. Researchers at Nextstrain are using this data to create a family tree of the virus’ spread.

“From the first few sequences, we could identify similarities and confidently say these viruses had emerged very recently, within the past couple of months, in China. The genetics then led us to cases in other countries directly related to those Chinese samples,” explains Hodcroft.

“Because of the fast sharing of data, we are providing a real-time look at the pandemic, in a way previously not possible. I really hope this will transform how we can track other diseases in the future,” she adds.

This underlying approach of using the genome of a pathogen to understand how it spreads, called genomic epidemiology, was pioneered during the AIDS epidemic in the 1990s and has expanded to other pathogens such as influenza. The falling cost of the sequencing technology has made it increasingly more accessible.

Dr Lauren Cowley, prize fellow of bioinformatics at the Department of Biology and Biochemistry at the University of Bath, used this tracing method in 2015 to track the spread of Ebola in Guinea. Using portable sequencing technology by Oxford Nanopore Technologies, called MinION, Cowley and her colleagues could determine the relatedness of samples from patients.

“Roughly every two weeks the Ebola virus changes something in its genome, therefore if two samples had exactly the same sequence, then we would know they were likely part of the same transmission chain,” explains Cowley.

“This helped epidemiologists track whether a transmission chain was contained or whether more people were at risk and if there were contacts of the patients’ that needed to be monitored for symptom development.”

Similarly, in its first public update at the end of March, Cog-UK said it had identified 12 viral lineages in the initial 260 viral genomes it sequenced, suggesting independent introductions of Covid-19 to the UK coming from areas with large epidemics and high travel volumes, notably Italy and other parts of Europe.

Hodcroft says this technology will become particularly useful for informing public health decisions towards the end of the pandemic.

“If we can determine new cases in a city are from local transmission, it tells us current measures are not working because the virus is spreading locally again. However, if it shows new cases are imported, then we know we need to be careful about people travelling from other areas. This is important when trying to understand how much to loosen restrictions on the public or to find weaknesses in your strategy,” she explains.

It’s hoped the research ongoing at the Cog UK consortium, which Hodcroft says is "above and beyond what any other country is doing", along with anti-body testing just approved by Public Health England, will help the government better understand infection among the UK population, down to individual transmission chains.

A characteristic of coronavirus that has healthcare professionals puzzled is why certain people are more adversely affected than others. While this could be explained by many factors, there’s a hypothesis that mutations in a person’s genetics could affect how they react to the disease and their chances of surviving it.

Everybody has a human genome in every single cell, and by and large, the code is the same, apart from some sporadic mutations. These change parts of the genome; some are incredibly rare and others very common. 

“We don't know how much of the variation in Covid-19 outcomes are driven by common genetic effects, some of which may be acting through frequently seen comorbidities (like diabetes or cardiovascular disease); or by rarer mutations, which predispose people to poor outcomes possibly related to different immune responses or uncontrolled inflammatory events,” explains Professor Nicholas Timpson, a Professor of Genetic Epidemiology at Bristol University and a Wellcome Trust Investigator.

Timpson works on the University of Bristol’s Children of the '90s study, which has been collecting "everything from toenails and teeth" from a cohort of children since birth. Timpson and his colleagues are now surveying participants about how they have been affected by Covid-19 and hope to use this information to assist ongoing medical research into the disease.

“For example, we’ve been measuring respiratory health in participants for decades, so we’re in a very special position because we can bring retrospective data forward into the analysis; past healthcare trajectory could be extremely important in understanding who gets better from Covid-19 and who is badly affected,” he says.

Similarly, consumer genetics testing and analysis company 23andMe has enrolled more than half a million of its customers onto a study to find potential genetic associations related to severity of coronavirus symptoms. The company will be studying de-identified, aggregate genetic information alongside answers to survey questions on experience with Covid-19 symptoms to get a fuller picture of potential correlations. 

Identifying these genetic markers could help target the development of specific treatments and vaccines for coronavirus. Timpson, however, says this can be difficult because, unlike rare and specific changes in genomes, there may be a common variation that affects a significant chunk of the population, but its actual impact, though very real, is very small.

However, technology, such as artificial intelligence (AI) and machine learning can help speed-up this analysis, especially when working with sequenced genomes, which produce huge amounts of data.

“Measuring the entire genome and working in a data-driven way, rather than generating hypotheses about which genes would be involved in which diseases, can be more efficient,” says Timpson.

Swiss health-tech company SOPHiA GENETICS, which developed an AI-based platform that precisely analyses raw genome data to help clinicians better diagnose patients, is working in this way with its partner Paragon Genomics to help researchers make genetic discoveries related to Covid-19 outcomes.

The company wants to create a ‘multi-modal’ approach to predict outcomes and tailor therapeutic approaches.

“Using the genome of the virus and the host, combined with data about how the patient was treated and what happened to them, the SOPHiA platform could identify patterns by looking for a combination of data points to predict a patients’ clinical outcome and recommend treatments based on previous results of other patients with similar signatures,” explains Dr Philippe Menu, chief medical officer at SOPHiA GENETICS.

The platform is already trained to do this for lung cancer patients using analysis of CT scans, known as radiomics, and other clinical data. For coronavirus, it could be used to triage patients better. “The vision is to develop an optimised predictive score across genomics, radiomics and clinical data, that help doctors predict the most likely Covid-19 disease evolution at time of diagnosis and tailor therapeutic interventions accordingly,” says Menu.

The platform is currently going through a validation phase for sequencing the whole viral genome. Once there is enough data, it will start looking for variations across viral samples. To pursue the multimodal analysis, Menu says the company is in discussions with different centres.

Similarly, in only a matter of weeks, AI-based drug-discovery company BenevolentAI used its machine-learning platform to identify a potential drug to treat some Covid-19 patients.

Using a biomedical ‘Knowledge Graph’ it had curated over the past five years, researchers assessed potential treatments that could specifically inhibit the cellular processes the virus uses to infect human cells and reduce inflammatory damage. The predictive tools identified an existing rheumatoid arthritis pill, baricitinib, as a potential treatment. The drug is now being trialled by Eli Lilly. 

In April, NHSX, the technology arm of the NHS, announced it was establishing a centralised UK repository of chest X-ray, CT and MRI images for use by AI applications to improve the understanding of Covid-19 and support treatment of the disease. 

Zegami, an Oxford University spin-out, has developed a new machine-learning radiomics model on its AI platform that hopes to use these images to help radiologists more quickly identify coronavirus cases and provide better treatment outcomes by learning from past successes.

Doug Lawrence, a data scientist at Zegami who has been training the platform, says it has already shown 70-75 per cent proficiency in identifying coronavirus cases apart from images of viral and bacterial pneumonia, as well as images of healthy lungs, using a limited dataset of 226 Covid-19 infected lung images.

“A tool that can filter people into a high or lower risk bracket, even at only 70 per cent accuracy, is still very useful in saving radiologists time,” he says.

The longer-term ambition of the company, however, is to receive anonymous information about the treatment plan and outcome for each patient image.

“If we had data about people in intensive care or who were treated with specific antibiotics, the platform could predict potential outcomes and recommend treatments based on this data,” says Stephen Taylor, co-founder of Zegami and chief scientific officer.  “It’s about binding the metadata with the image to give doctors more confidence in treatment and diagnosis.”

But Taylor says the nature of the platform means it could be easily used to explore a range of hypothesis.

“There's a whole bunch of characteristics you can measure, I think this provides a simple and easy-to-use interface from which it’s possible to investigate different parameters without doing lots of coding – putting this tool in the hands of non-data scientists is very powerful because they can come up with interesting hypotheses and then test them,” says Taylor.

Zegami has applied to NHSX for chest X-ray images, which it is hoping to receive soon.

While a vaccine for the novel coronavirus is still in development, there is hope that the throng of ongoing research can help with the management and treatment of the virus in the interim. In fact, there is a clear race to make discoveries and provide healthcare professionals with new tools. It will be interesting to see who is successful first.

One thing is certain though: the rapid rate of research, cross-border collaboration and fast deployment of technologies are among the few positives to emerge from the coronavirus crisis.

Health study

Children of the '90s

If you were born in or around Bristol in 1991 or 1992, then you could have been part of Children of the 90s health study.

It doesn't matter if you stopped taking part years ago, your data is important and you can re-join the study at any time.

To find out if you were involved in the study please text your full name and date of birth to 07772 909090 or visit childrenofthe90s.ac.uk

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles