A systematic review, the first of its kind on the performance efficiency of AI in diagnosing ailments being equivalent to medical professionals was conducted in the UK. Its results were published in ‘The Lancet Digital Health’.
Researchers included 14 studies that had top quality reporting and design in which they could repose confidence for their analysis. Diagnostic performance of AI was evaluated on two parameters; sensitivity and specificity. Sensitivity indicated the likelihood of the diagnostic tool in obtaining positive outcome in the afflicted individuals while specificity indicated the diagnostic test’s accurateness.
The studies evaluated both the methods keeping the test sample same and found that AI’s accuracy in detection of diseases varying from eye ailments to cancers was equivalent to that of medical professionals though not substantially more.
However, researchers laid down certain limitations to the study. Normal clinical practice could not be replicated and the diagnostic accurateness was determined in isolated setting. Major studies involved comparison of datasets whereas a top quality study in diagnostic performance necessitated comparisons to be made in people. Datasets did not provide information on missing data if any. Several more limitations existed such as lack of consistent terminology, absence of threshold setting for specificity and sensitivity analysis and absence of out-of-sample validation.
There is a constant tug of war between adopting new, probably life-saving tools of diagnosis and the importance of developing top notch evidence such that patients and systems of medical care in clinical practice are benefitted. Researchers concluded that a good study design was needed to avoid the occurrence of bias which could affect results. Biases could very well lead to the making of tall claims regarding the performance efficiency of AI which would not be obtainable in real life settings.
It was essential to have AI algorithms tested for their diagnostic efficiency viz-a-viz alternative tests of diagnosis conducted in randomized controlled trials. No AI algorithm decision was followed upon for ascertaining the effect on patients so far; for example, discharge time from hospital, timely treatment or rates of survival.