Table 2:

Detailed summary of levels of evidence^{^a}

Levels of Evidence	Element	Description	Types of Evidence	Significance
Level 1	Clinical efficacy	Clinical efficacy is the assessment of how the AI tool impacts patient care and health care outcomes	AI tool has been used in at least 1 prospective study, randomized clinical trial, or meta-analysis demonstrating potential for improved patient care and health care outcomes, including improved mortality, quality of life, morbidity, or reduced health care cost	Although measuring clinical efficacy and added value for an early AI technology is challenging, it remains the single most important feature for clinical success and adoption
Level 2	Bias and error mitigation	Biases of different types invariably exist in all data and can lead to AI modeling errors when applied to patients in different clinical settings	AI model can adapt to at least 2 separate institutions different from where it was initially developed (2 independent retrospective studies) AND Peer-reviewed results from those retrospective studies from above are available describing the various populations used to test the AI model, including age ranges, sex, and types of scanners AND AI company has a process to continuously incorporate feedback and improve their model (postdeployment monitoring)	Al related errors in clinical practice may be harmful to patients; thus, the AI tool should be tested at multiple different sites, with differing patient demographics, disease prevalence, and imaging vendors to determine operational characteristics, generalizability, and potential pitfalls As clinical and patient care standards are constantly evolving, AI models may need routine surveillance and updates for performance and data drift
Level 3	Reproducibility and generalizability	AI-enabled tool can be applied to different clinical settings, while demonstrating consistent high-quality results	At least 2 retrospective studies showing that the AI tool has performance characteristics similar to or alternative methods in the literature AND At least 1 independent study different from where the AI tool was developed to demonstrate that the AI tool can adapt to at least 1 different institution	A multi-institution approach supports reproducibility and generalizability of the AI tool
Level 4	Technical efficacy	Technical efficacy is the assessment that the AI model correctly performs the task that it was trained to do	AI model performance has been shown in 2 retrospective studies to have potential clinical impact compared with similar or alternative methods in the literature AND These retrospective studies can be from the same institution	Recent investigations suggest that less than 40% of AI models have peer-reviewed evidence available on their efficacy⁶
Level 5A	Data quality and AI model development with external testing	AI models are prone to overfitting and an external test data set during development should be used to report final performance metrics	Level 5B evidence as described with an external test set used for performance validation	Inclusion of an external data set during the development phase supports the generalizability of the AI tool
Level 5B	Data quality and AI model development with internal testing	AI company should provide peer-reviewed information about the characteristics of the data used to develop the AI model AI company can explain how the AI model makes decisions that are relevant to patient care	One retrospective study showing the following: Peer-reviewed results detailing the inclusion/exclusion criteria, source, and type of data used to train, validate, and test the AI model AND Peer-reviewed results describing how the AI model was developed, including use of a standard of reference that is widely accepted for the intended clinical task AND No final external test set was used for final performance reporting	Data characteristics will influence the suitability and applicability of the AI model to the target patient population of interest Selection of a high-quality standard of reference is important for accurately comparing the peak performance of the AI model to that of current clinical practice
Level 6	Interoperability and integration into the IT infrastructure	AI software should integrate seamlessly into the hospital information system, radiology information system, and PACS to be clinically useful	AI company can provide a plan including interoperability standards for integration into the existing radiology and hospital digital information systems AI company can provide on-site demonstration of clinical integration and potential impact on workflow before full deployment	Successful clinical implementation of an AI tool requires close collaboration between the AI company and site experts, including radiologists, referring physicians, data scientists, and information technologists Real time demonstration is an important mechanism for identifying potential site-specific problems
Level 7	Legal and regulatory frameworks	Patient consent, privacy, and confidentiality laws will vary depending on state, local, and institutional regulations	AI-enabled tool is compliant with current patient data protection, security, privacy, HIPAA, and government regulations	AI companies, health care systems, and radiologists are key gatekeepers of patient autonomy, privacy, and safety

↵a Appropriate reporting of AI model performance will depend on the task; however, examples of relevant statistical measures include ROC, sensitivity, specificity, and positive and negative predictive values, among others.