Role of Artificial Intelligence in Radiological Imaging,” here are the key US FDA regulatory considerations you should be aware of.
1. AI software applications are fundamentally different in that an AI algorithm is created and improved by feeding it data so it can learn, and eventually, if it implements Deep Learning, it can learn and improve autonomously based on new data. AI is a big business opportunity.
According
to an analysis by Accenture, the market for AI applications for preliminary
diagnosis and automated diagnosis is $8 billion. The same analysis points out
that there is a 20 percent unmet demand for clinicians in the US by 2026, which
can be addressed by AI.
It became clear during the conference that the prediction made in
November of 2016 by Geoffrey Hinton that deep learning would put radiologists
out of a job within 5 years was a gross miscalculation. No jobs have been lost
as of today, by contrast, the number of studies to be reviewed is increasing to
almost 100 billion images per year, to be read by approximately 34,000
radiologists, requiring more and more images to be read faster and more
efficiently. The use of AI to eliminate “normal” cases, especially for
screening exams such as for breast cancer or TB in chest images, will only be a
big relief for radiologists.
2.
AI will not make radiologists obsolete but rather
will change their focus as the image by itself might become less important than
the overall patient context. We spend a lot of time improving
image quality by reducing image artifacts and increasing resolution so a
physician can make a better diagnosis. However, as one of the speakers brought
up, using autonomous AI could potentially eliminate the need of creating an
image, by basing the diagnosis directly on the information in the raw data. Why
would we need an image? Remember, the image was created to optimally present
information to a human, ideally matching our eye-brain detection and
interpretation. If we apply the AI algorithm on the acquired data without
worrying about the image, we could use it on CT raw data streaming straight
from the detector, or the signals directly from the MR high frequency coils,
the ultrasound sound waves, or the EKG electrical signals, or whatever
information comes from any kind of detector. Images have served the physicians
very well for many years. In some cases, “medical imaging” will be implemented
without the need to produce an image and we might need to rename it to become
“medical diagnosing” instead. I believe that a radiologist is first and
foremost an MD and thinking that they will be out of a job when there is less
of an emphasis on the images seems misguided.
3.
AI algorithms are often focused on a single
characteristic, which is a problem when using them in an autonomous mode
causing incidental findings to go unnoticed. There were two good examples given
during the workshop, the first one was an ultrasound of the heart of a fetus
which looked perfectly normal. So, if one would run an AI algorithm to look for
defects, it would pass as being OK. However, in this particular case as shown
in the image, the heart was outside the chest, aka Ectopia Cordis, a rare
condition, but if present should be diagnosed early to treat accordingly. The other example was for autonomous AI detection of fractures. Fractures are very
common for children as I can attest personally having many grandkids who are
very active. One of the speakers mentioned that in some cases when looking at
the fracture there are incidental findings of bone cancer, something that a
“fracture algorithm” would not detect. So, maybe my previous hypothesis that an
image might become eventually obsolete is not quite correct, unless we have an
all-encompassing AI detection algorithm that can identify every potential
finding.
The problem with creating an all-encompassing AI is that there are some very rare findings and diseases for which there is relatively little data available. It is easy to get access to tens of thousands of chest images or breast images with lung or breast cancer from the public domain for example from NCI, however for rare cases there might be not enough data available to be statistically significant to train and validate an AI algorithm.
The problem with creating an all-encompassing AI is that there are some very rare findings and diseases for which there is relatively little data available. It is easy to get access to tens of thousands of chest images or breast images with lung or breast cancer from the public domain for example from NCI, however for rare cases there might be not enough data available to be statistically significant to train and validate an AI algorithm.
4.
There are still many legal questions and
concerns about AI applications. As an analogy, the electric car company Tesla
is being sued right now by the surviving family of the person who died after
his car crashed in a highway median because the autopilot misread the lane
lines. Many people die because they crash into the medians because of human
error, however, there is much less tolerance for errors made by machines than
by humans. The question is who is accountable if an algorithm fails with
subsequent patient harm or even death, the hospital, the responsible physician,
or vendor of the AI algorithm?
5.
A discussion about any new technology would not
be complete without a discussion about standards. How is an algorithm
integrated into an existing PACS viewer or medical device software and how is
the output of the AI encoded? The IHE has just released a set of profiles that address
both the AI results and workflow integration in two profiles. Implementors are
encouraged to support these standards and potential users are encouraged to
request them in their RFP’s.
6.
There are three different US FDA regulatory
approval and oversight classifications for medical devices and software:
1.
Class 1: Low risk, such as an image router. This
classification requires General Controls to be applied (Good Manufacturing
practices, complaint handling, etc.)
2.
Class 2: Moderate risk such as a PACS system or
medical monitor, as well as Computer Aided Detection software. This
classification requires both general as well as special controls to be applied.
These devices and software require a 510(k) premarket clearance.
For a moderate risk device that does NOT have a predicate device, a new procedure has been developed aka a “de novo” filing. For example, the first Computer Aided Acquisition device which was approved in January 2020 followed the de novo process.
For a moderate risk device that does NOT have a predicate device, a new procedure has been developed aka a “de novo” filing. For example, the first Computer Aided Acquisition device which was approved in January 2020 followed the de novo process.
3.
Class 3: High risk such as Computer Aided
Diagnosis which requires general controls AND Premarket Approval (PMA).
7.
AI can be distinguished into the following
categories:
a.
CADe or Computer Aided Detection - These aid in
localizing and marking of regions that may reveal specific abnormalities. The
first application was for breast CAD, initially approved in 1997, followed by
several other organ CAD applications. CADe has recently (as of January 2020) be
reclassified to NOT need a PMA but rather being class 2 and needing only a
510(k).
b.
CADx or Computer Aided Diagnosis - Aids in
characterizing and assessing disease type, severity, stage and progression
c.
CADe/x or Computer Aided Detection and Diagnosis
- This is a combination of the first two classifications as it will do both
localizing as well as characterizing the condition.
d.
CADt or Computer Aided Triage - This aids in
prioritizing/triaging time sensitive patient detection and diagnosis. Based on
a CADe and/or CADx finding, it could immediately alert a physician or put it on
the top of a worklist to be evaluated.
e.
CADa/o or Computer Aided
Acquisition/Optimization - Aids in the acquisition/optimization of images and
diagnostic signals. The first CADa/o was approved in January 2020 for ultrasound
to provide help to non-medical users to acquire images. Being first-in-class,
it followed the de novo clearance process.
8.
Other dimensions or differentiation between the
different AI algorithms are:
·
Is the algorithm “locked” or if it is
continuously adaptive? An example of a locked algorithm was the first CADe
application for digital mammography, its algorithm was locked and it is still
basically the same as when the FDA cleared its initial filing in 1996. An
adaptive algorithm will continue to learn and supposedly improve.
·
What is the reader paradigm? AI can serve as the
first reader, which then possibly determines its triage, as a concurrent
reader, e.g. it will do image segmentation or annotation while a physician is
looking at an image, as a secondary reader, such as used to replace a double
read for mammography, or it can include no human reader being autonomous. The
first clearance for a fully autonomous AI application, based on having a better
specificity and sensitivity than a human reader, was for diabetic retinopathy
which was cleared in January of 2019.
·
What is the oversight? Is there no oversight, is
it sporadic, or continuous? Note that this is different from the reader
paradigm, a fully autonomous AI algorithm application might still require regular
oversight as part of the QA checking and post-market surveillance, especially
if the algorithm is not locked but adaptive.
inconsistent and unclear. The majority of the products, i.e. more than 60 percent are cleared under the PACS product code (LLZ) as that is the most logical place for any image processing and analysis related filings, the remainder is cleared under 6 different CAD categories (QAS, QFM, QDQ,POK, QBS, and the most recent QJU) and a handful others. If a vendor wants to file a new algorithm, the easiest path is to convince the FDA that it fits under LLZ as there are many predicates and a lot of examples, assuming that the FDA approves that approach. I would assume that they want to steer new submissions towards the new classifications, however as you can see from the chart, there are very few predicates, sometimes only a single one.
10.
Choosing the correct size and type of dataset
that is used for the learning is challenging:
·
There are no guidelines on the number of cases
that are to be included in the dataset that is used for the algorithm to learn
and to validate its implementation. The unofficial FDA position is that the
data should be “statistically significant,” which means that it requires
intensive interaction with the FDA to make sure it meets its criteria.
·
Techniques and image quality vary a lot between
images, to the extent that certain images might not even be useful as part of
the dataset.
·
One needs to make
sure that the dataset is representative for the body part, disease, and
population characteristics. It has been acknowledged that a dataset from e.g.
Chinese citizens might not be applicable for a population in US, Europe or
Africa. In addition, it became clear that it might need to be retrained based
on the type of institution (compare a patient population at a VA medical center
with the patients at a clinic in a suburb) and even geographic location
(compare Cleveland with Portland, the youth in Cleveland being the most obese
in all of the US).
·
There is a big
difference between different manufacturers on how to represent their data. This requires the
normalizing and/or preparation of the data to make sure the algorithm can work
on it. Even for CR/DR there are different detector/plate characteristics,
different noise patterns, image processing applied by the vendor, different
LUT’s applied, etc.
The figure shows the intensity values for different MRI’s.
The figure shows the intensity values for different MRI’s.
11.
There should be a clear distinction between the
three different datasets that are used for different purposes:
·
The training dataset that
is used to train the AI algorithm.
·
After the initial training is done, one would
use a tuning dataset to optimize the algorithm.
·
As soon as the algorithm development is complete,
it will become part of the overall architecture and is verified with an
integration test, which tests against the detailed design specs. This is
followed by a system test that verifies against the system requirements, and
lastly by a final Validation and Verification, which test against the user requirements
using a separate Test dataset.
12.
AI clearance changed the traditional process in
that now pre-clearance testing and validation and post-market surveillance are
required. The pre-clearance is covered by the pre-submission, aka as the Q-Submission
program, which has a separate set of guidelines and is extensively used by AI
vendors. It is basically a set of meetings with the FDA with the focus on determining
that the clinical testing is statistically significant and that the filing
strategy is acceptable. Last year, there were 2200 pre-submissions out of 4000
submissions, which shows that it has become common practice. The FDA strongly
encourages this approach.
The post-market surveillance is very important for non-locked algorithms, i.e. the ones that are self-learning and supposedly continuously improving. The challenge is to make sure that the algorithms are getting better and not worse, which requires post-market surveillance. There was a lot of discussion about the post-market surveillance and a consensus that it is needed but there were no guidelines available (yet) on how this would work.
The post-market surveillance is very important for non-locked algorithms, i.e. the ones that are self-learning and supposedly continuously improving. The challenge is to make sure that the algorithms are getting better and not worse, which requires post-market surveillance. There was a lot of discussion about the post-market surveillance and a consensus that it is needed but there were no guidelines available (yet) on how this would work.
13.
There are a couple of applicable documents that
are useful when looking to get FDA clearance for an AI application: the Q-submission process, the De
Novo classification request, and regulatory framework discussion paper.
The FDA initiative to have an open discussion in the form of
a workshop was an excellent idea and brought forth a lot of discussion and
valuable information. You can find a link to the many presentations at their website.
It was obvious that the regulatory framework for AI applications is still very
much under discussion. Key take-aways are the use of pre-submissions to have an
early dialogue with the FDA about the acceptable clinical data used for
training and validation, and regulatory product classification and approach, as
well as the need for a post market assessment, which is not defined (yet)
especially for adaptive AI algorithms.
The de novo approach will also be very useful for the “to-be-defined” product definitions and it might be expected that the list of product classifications will grow as more products are introduced. AI is here to stay and the sooner the FDA has a well defined process and approach, the faster these products can make an impact to the healthcare industry and patient care.
The de novo approach will also be very useful for the “to-be-defined” product definitions and it might be expected that the list of product classifications will grow as more products are introduced. AI is here to stay and the sooner the FDA has a well defined process and approach, the faster these products can make an impact to the healthcare industry and patient care.