Using Artificial Intelligence to identify inherent errors in medical data that even experts can’t detect

Presagen Webinar Series: Using Artificial Intelligence to identify inherent errors in medical data that even experts can’t detect

In Jan 2023 Dr Don Perugini presented the webinar “Using Artificial Intelligence to identify inherent errors in medical data that even experts can’t detect”. This is the first of a series of webinars on AI in healthcare, IVF, and digital health. Below is the transcript of the presentation, the webinar video, the related Nature Scientific Reports paper, and presentation slides.

Click here to download the presentation.

Click here to download the Nature Scientific Reports paper.

Slide 1

Hi everyone, Thanks for joining me.

I’m Don Perugini from Presagen. Today I will be talking about how we can use Artificial Intelligence to detect inherent errors in medical data that even experts can’t detect.

Slide 2

Medical data is inherently poor quality and can contain a significant number of errors.

We have seen datasets where almost a third of the dataset comprised errors.

Where do the errors come from?

Errors could be from subjectivity, where different experts have a different opinion about a medical outcome.

Or it could be uncertainty. This arises, for example, where medical tests are inaccurate. Or there are unknown factors that impact medical outcomes, due to complex biology that we do not yet fully understand.

Slide 3

Errors in medical data can have significant consequences when we need to use the data for patient care.

Good quality data is vital for healthcare applications such as Artificial Intelligence and precision health, as well as clinical trials where we rely on the data to help us determine whether medical treatments are safe to use.

Slide 4

I will give a simple example of how inherent errors can plague medical datasets.

Let's assume we want to identify symptoms associated with Covid patients.

This is easy enough. We just need to collect symptom data by patients with Covid and without Covid.

Slide 5

However, what if Rapid Antigen Tests, or RAT tests, were used to diagnose whether a patient had Covid or not?

These tests are not perfect.

Slide 6

RAT tests can have over 20% error in correctly diagnosing that Covid patients actually have Covid.

This means that the symptom dataset we are collecting will label 20% of the Covid patients as not having Covid.

You could imagine how these errors could impact on the reliability and accuracy of using that data to identify symptoms related to Covid.

Even though this is a simple made-up example, these inherent errors are quite common in healthcare.

Slide 7

There is always an assumption that experts can manually verify and correct these errors.

As you can see from the previous Covid example, this is not always the case.

In cases where it is possible to have experts manually verify and correct errors, it is not always practical.

Barriers include scalability, where the size of the data to be reviewed is too large or growing too fast to keep up,

Data privacy policies may prevent third parties manually inspecting the data, particularly across borders from different countries.

And lack of contextual information means that experts may not be able to reliably verify the data.

Given these barriers – what can we do to solve this problem?

Slide 8

There is a solution, and it uses artificial intelligence.

The algorithm is called the UDC, and its been published in Nature Scientific Reports. You can download the paper from our website, Presagen.com.

Basically, the UDC identifies inconsistencies in data, which are typically errors.

The great thing is that it is automated and does not require manual verification.

Hence, the algorithm preserves data privacy and is scalable to large datasets.

Slide 9

So how does the algorithm work?

Let’s take the problem of training an AI algorithm to identify pictures of cats and dogs.

Slide 10

During training, the AI learns patterns and features that are specific to cats and dogs.

When training is complete, you can give the AI a new picture, and the AI will use these learned features to tell you whether the image contains a cat or dog.

Slide 11

However, what if there is an error in the dataset used for training?

What if we introduce an image of a dog and label it a ‘cat’?

The AI does not know it is actually an image of a dog – it learns based on data we give it.

The AI algorithm will get confused because there will be inconsistencies in the dataset.

It is like trying to fit a square peg into a round hole.

During training, as the AI strives to find patterns to match the erroneous dog labelled as a cat with images of other cats, the AI will more often match features of other dogs.

Therefore, during training, the AI will consistently get that erroneous dog image incorrect.

Slide 12

This is effectively how the UDC works.

The UDC stands for Untrainable Data Cleansing, and that is effectively what happens to erroneous data.

They become inconsistent and untrainable.

During AI training the AI will consistently get these data incorrect, which indicates that the data is likely erroneous or mis-labelled.

In this case, you can either re-label the image of the dog from ‘cat’ to ‘dog’, or what we prefer to do, remove that suspicious data point from the dataset altogether.

Slide 13

So how well does the UDC work?

We tried the UDC on a range of different datasets, from non-medical datasets, including images of cats and dogs and vehicles, to medical datasets, including images of embryos and chest x-rays, even non-image medical record data.

And we found that the UDC worked exceedingly well.

Slide 14

Starting with cats and dogs, we intentionally introduced up to 50% errors in cat images, and up to 30% errors in both cats and dogs at the same time.

We showed that the UDC was highly successful in automatically detecting the erroneous data, and improved the accuracy of the AI in identifying cats and dogs by 20% to 30%.

Slide 15

Similarly for vehicles, we introduced up to 70% errors in the dataset.

Again, the UDC was effective in identifying the erroneous data, and the AI accuracy in identifying vehicles improved by up to 45%, which is significant.

Slide 16

The first medical example is images of embryos from IVF patients.

This example relates to an actual AI product now in use around the world by IVF patients.

The aim is to use AI to non-invasively and instantly identify the genetic integrity of embryos just using an image.

Currently, the genetic integrity of embryos is assessed using an invasive and costly technique called PGT-A.

PGT-A requires that the embryo be biopsied and genetically tested for abnormalities, like down syndrome.

Due to the nature of the PGT-A test, there can be errors or an overestimate of embryos being labelled as genetically abnormal, and therefore unnecessarily discarded.

Surprisingly, the UDC algorithm discovered that around 37% of embryo images were likely mis-labelled as genetically abnormal when they should have been normal.

After the UDC removed these errors, there was a 10% accuracy improvement in the AI in identifying the genetic integrity of embryos.

Slide 17

The next medical image example is an interesting one. We use the UDC on x-ray images which are used detect pneumonia.

This example is interesting because the UDC was not used to identify errors in the data.

Rather, the UDC was used to identify x-ray images that are difficult to assess, by either expert radiologists or an AI algorithm, because it lacks clearly identifying features to make a definitive or clear assessment.

These noisy images also have a negative impact on AI training, and removing them improved the AI accuracy by up to 25%.

In this case, in addition to improving AI performance, techniques like the UDC can be used as a triage tool to assist radiologists identify images that are likely to be inconclusive or difficult to diagnose, so they can request additional tests.

Slide 18

Lastly, we have shown the UDC is applicable to non-image based medical datasets.

We applied the UDC to a medical record-based dataset, and again the UDC was able to identify the errors, and improve AI accuracy.

Therefore, the UDC can be applied to any dataset, including non-AI healthcare examples, or to potentially find errors in clinical trial data, or other medical data used to support targeted treatment for patients.

Slide 19

There are many potential applications of the UDC.

It can be used to automatically detect errors in data which is private and cannot be manually accessed or viewed.

Or used to detect errors in massive datasets where manual verification is impractical.

We hypothesize there are two other potential uses cases.

The UDC could potentially detect anomalies in data.

It could also be used in the labeling or prediction for difficult to predict data or where the statistical distribution between datasets varies widely.

Slide 20

I would like to thank you for joining this webinar.

This presentation and paper on the UDC are available on our web site at Presagen.com.

Thank you.