Table 2:

Detailed summary of levels of evidencea

Levels of EvidenceElementDescriptionTypes of EvidenceSignificance
Level 1Clinical efficacyClinical efficacy is the assessment of how the AI tool impacts patient care and health care outcomesAI tool has been used in at least 1 prospective study, randomized clinical trial, or meta-analysis demonstrating potential for improved patient care and health care outcomes, including improved mortality, quality of life, morbidity, or reduced health care costAlthough measuring clinical efficacy and added value for an early AI technology is challenging, it remains the single most important feature for clinical success and adoption
Level 2Bias and error mitigationBiases of different types invariably exist in all data and can lead to AI modeling errors when applied to patients in different clinical settingsAI model can adapt to at least 2 separate institutions different from where it was initially developed (2 independent retrospective studies) AND Peer-reviewed results from those retrospective studies from above are available describing the various populations used to test the AI model, including age ranges, sex, and types of scanners AND AI company has a process to continuously incorporate feedback and improve their model (postdeployment monitoring)Al related errors in clinical practice may be harmful to patients; thus, the AI tool should be tested at multiple different sites, with differing patient demographics, disease prevalence, and imaging vendors to determine operational characteristics, generalizability, and potential pitfalls As clinical and patient care standards are constantly evolving, AI models may need routine surveillance and updates for performance and data drift
Level 3Reproducibility and generalizabilityAI-enabled tool can be applied to different clinical settings, while demonstrating consistent high-quality resultsAt least 2 retrospective studies showing that the AI tool has performance characteristics similar to or alternative methods in the literature AND At least 1 independent study different from where the AI tool was developed to demonstrate that the AI tool can adapt to at least 1 different institutionA multi-institution approach supports reproducibility and generalizability of the AI tool
Level 4Technical efficacyTechnical efficacy is the assessment that the AI model correctly performs the task that it was trained to doAI model performance has been shown in 2 retrospective studies to have potential clinical impact compared with similar or alternative methods in the literature AND These retrospective studies can be from the same institutionRecent investigations suggest that less than 40% of AI models have peer-reviewed evidence available on their efficacy6
Level 5AData quality and AI model development with external testingAI models are prone to overfitting and an external test data set during development should be used to report final performance metricsLevel 5B evidence as described with an external test set used for performance validationInclusion of an external data set during the development phase supports the generalizability of the AI tool
Level 5BData quality and AI model development with internal testingAI company should provide peer-reviewed information about the characteristics of the data used to develop the AI model AI company can explain how the AI model makes decisions that are relevant to patient careOne retrospective study showing the following: Peer-reviewed results detailing the inclusion/exclusion criteria, source, and type of data used to train, validate, and test the AI model AND Peer-reviewed results describing how the AI model was developed, including use of a standard of reference that is widely accepted for the intended clinical task AND No final external test set was used for final performance reportingData characteristics will influence the suitability and applicability of the AI model to the target patient population of interest Selection of a high-quality standard of reference is important for accurately comparing the peak performance of the AI model to that of current clinical practice
Level 6Interoperability and integration into the IT infrastructureAI software should integrate seamlessly into the hospital information system, radiology information system, and PACS to be clinically usefulAI company can provide a plan including interoperability standards for integration into the existing radiology and hospital digital information systems AI company can provide on-site demonstration of clinical integration and potential impact on workflow before full deploymentSuccessful clinical implementation of an AI tool requires close collaboration between the AI company and site experts, including radiologists, referring physicians, data scientists, and information technologists Real time demonstration is an important mechanism for identifying potential site-specific problems
Level 7Legal and regulatory frameworksPatient consent, privacy, and confidentiality laws will vary depending on state, local, and institutional regulationsAI-enabled tool is compliant with current patient data protection, security, privacy, HIPAA, and government regulationsAI companies, health care systems, and radiologists are key gatekeepers of patient autonomy, privacy, and safety
  • a Appropriate reporting of AI model performance will depend on the task; however, examples of relevant statistical measures include ROC, sensitivity, specificity, and positive and negative predictive values, among others.