AI tools found using shortcuts to diagnose Covid-19
Image credit: Dreamstime
A University of Washington team has assessed multiple AI models put forward as potential tools for detecting Covid-19 in patients and found that these models rely on “shortcut learning” to reach their conclusions.
When AI tools are trained to detect disease, they can fail to recognise clinically significant indicators and instead look for shortcuts, such as - in one infamous case - the appearance of a ruler in images of skin cancer.
In this latest case, looking at coronavirus, as described in Nature Machine Intelligence, the models used characteristics such as text markers or patient positioning specific to each dataset to detect the presence of the novel coronavirus.
“A physician would generally expect a finding of Covid-19 from an X-ray to be based on specific patterns in the image that reflect disease processes,” said co-lead author Alex DeGrave, a PhD candidate. “Rather than relying on those patterns, a system using shortcut learning might, for example, judge that someone is elderly and thus infer that they are more likely to have the disease because it is more common in older patients.
“The shortcut is not wrong per se, but the association is unexpected and not transparent. That could lead to an inappropriate diagnosis.”
Shortcut learning, while not technically incorrect, is not robust and is likely to cause the model to fail outside of the original setting. This can render the AI tool a serious liability, particularly given the opacity associated with AI decision-making (how a tool produces predictions is often regarded as a “black box”).
DeGrave explained: “A model that relies on shortcuts will often only work in the hospital in which it was developed, so when you take the system to a new hospital, it fails and that failure can point doctors toward the wrong diagnosis and improper treatment.”
With “explainable” AI techniques, researchers can elucidate in detail how various inputs and their weighs contribute to a model’s output, illuminating the black box. DeGrave and his colleagues used these approaches to evaluate the trustworthiness of AI models proposed for identifying cases of Covid-19 from chest X-rays which appeared to produce good results.
The team reasoned that these models would be prone to a condition known as “worst-case confounding,” owing to the lack of training data available for a disease as new as Covid-19; this increased the likelihood that the models would rely on shortcuts rather than learning the underlying pathology of the disease from the data.
“Worst-case confounding is what allows an AI system to just learn to recognise datasets instead of learning any true disease pathology," said co-lead author Joseph Janizek, who is also a PhD candidate. “It's what happens when all of the Covid-19 positive cases come from a single dataset while all of the negative cases are in another.
“And while researchers have come up with techniques to mitigate associations like this in cases where those associations are less severe, these techniques don't work in situations where you have a perfect association between an outcome such as Covid-19 status and a factor like the data source.”
The researchers trained multiple deep convolutional neural networks on X-ray images from a dataset that replicated the approach used in the published papers. First, they tested each model’s performance on an internal set of images from that initial dataset that had been withheld from the training data. Then, they tested how well the models performed on a second dataset meant to represent new hospital systems. While the models performed well when tested on images from the first dataset, their accuracy halved when tested on images from the second dataset.
The researchers then applied explainable AI techniques to identify which image features were influencing the predictions most heavily. They found that shortcuts were being taken using clues such as the positioning of the patient and text markers on the images.
“My team and I are still optimistic about the clinical viability of AI for medical imaging. I believe we will eventually have reliable ways to prevent AI from learning shortcuts, but it's going to take some more work to get there," said Professor Su-In Lee. “Going forward, explainable AI is going to be an essential tool for ensuring these models can be used safely and effectively to augment medical decision-making and achieve better outcomes for patients.”
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.