Abstract
BACKGROUND AND PURPOSE: Brain imaging plays an important role in investigating patients with cognitive decline and ruling out secondary causes of dementia. This study compares the diagnostic value of quantitative hippocampal volumes derived from automated volumetric software and structured scoring scales in differentiating Alzheimer disease, mild cognitive impairment, and subjective cognitive decline.
MATERIALS AND METHODS: Retrospectively, we reviewed images and medical records of adult patients who underwent MR imaging with a dementia protocol (2018–2021). Patients with postscanning diagnoses of Alzheimer disease, mild cognitive impairment, and subjective cognitive decline based on the International Statistical Classification of Diseases and Related Health Problems, 10th revision, were included. Diagnostic performances of automated normalized total hippocampal volume and structured manually assigned medial temporal atrophy and entorhinal cortical atrophy scores were assessed using multivariate logistic regression and receiver operating characteristic curve analysis.
RESULTS: We evaluated 328 patients (Alzheimer disease, n = 118; mild cognitive impairment, n = 172; subjective cognitive decline, n = 38). Patients with Alzheimer disease had lower normalized total hippocampal volume (median, 0.35%), higher medial temporal atrophy (median, 3), and higher entorhinal cortical atrophy (median, 2) scores than those with subjective cognitive decline (P < .001) and mild cognitive impairment (P < .001). For discriminating Alzheimer disease from subjective cognitive decline, an entorhinal cortical atrophy cutoff value of 2 had a higher specificity (87%) compared with normalized total hippocampal volume (74%) and medial temporal atrophy (66%), but a lower sensitivity (69%) than normalized total hippocampal volume (84%) and medial temporal atrophy (84%). In discriminating Alzheimer disease from mild cognitive impairment, an entorhinal cortical atrophy cutoff value of 3 had a specificity (66%), similar to that of normalized total hippocampal volume (67%) but higher than medial temporal atrophy (54%), and its sensitivity (69%) was also similar to that of normalized total hippocampal volume (71%) but lower than that of medial temporal atrophy (84%).
CONCLUSIONS: Entorhinal cortical atrophy and medial temporal atrophy may be useful adjuncts in discriminating Alzheimer disease from subjective cognitive decline, with reduced cost and implementation challenges compared with automated volumetric software.
ABBREVIATIONS:
- AD
- Alzheimer disease
- ERiCA
- entorhinal cortical atrophy
- ICD-10
- International Statistical Classification of Diseases and Related Health Problems, 10th revision
- IQR
- interquartile range
- MCI
- mild cognitive impairment
- MTA
- medial temporal lobe atrophy
- NTHV
- normalized total hippocampal volume
- ROC
- receiver-operating characteristic
- SCD
- subjective cognitive decline
Dementia is characterized by progressive loss of brain structure and function, resulting in a loss of intellectual abilities and interference with social or occupational functions.1 It is estimated that about 50 million people worldwide are affected by dementia, and this number is expected to triple to 150 million by 2050.2 The most common cause of dementia is Alzheimer disease (AD).
The diagnosis of AD is often based on clinical history, neuropsychological evaluation, CSF markers, and imaging tests such as PET and MR imaging. Brain MR imaging is routinely performed in patients being evaluated for cognitive decline, and it is helpful in ruling out secondary causes of dementia. Subjective cognitive decline (SCD), mild cognitive impairment (MCI), and AD are not mutually exclusive states, and MCI has been described as a transitional phase between SCD and AD, with overlapping boundaries; some patients eventually convert to AD, while others remain stable.3
Recently, there has been increased clinical availability of automated software tools such as NeuroQuant (Version 2.3.0; Cortechs, San Diego, California) and icobrain (icometrix, Leuven, Belgium) that measure overall and regional brain volumes and compare them with those of age-matched healthy controls. These tools provide an abundance of volumetric data and the potential for longitudinal tracking, but their diagnostic value in the clinical evaluation of dementia has not been widely adopted in clinical practice due to the wide variation in their technical and clinical validations for clinical practice, the lack of access to software algorithms, and the difficulty in integrating these tools into the clinical reporting workflow.4,5
Attempts have been made to develop validated semiquantitative visual assessment tools that use clear definitions of what constitutes atrophy along with visual examples of each score, including the medial temporal atrophy (MTA)6 and entorhinal cortical atrophy (ERiCA) scores.7 These scores can reduce reporting variability and facilitate communication between radiologists and referring clinicians, and they have been shown, in some studies, to have high diagnostic accuracy.7
The primary purpose of our study was to compare the diagnostic value of quantitative regional brain volumes derived from NeuroQuant and structured scoring scales in the evaluation of AD.
MATERIALS AND METHODS
Study Design and Patient Selection
This was a Health Insurance Portability and Accountability Act–compliant retrospective review that was approved by our institutional review board. The need for patient-informed consent was waived. We reviewed imaging studies and medical records of adult patients from 2018 to 2021 at 5 centers that fall under the umbrella of a single health care institution.
The inclusion criteria were adult patients who had undergone MR imaging with the dementia protocol within the search period and who had postscanning International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10) diagnoses of AD or MCI, including those having SCD. Exclusion criteria were patients with missing NeuroQuant quantitative parameters, patients with other neurodegenerative ICD-10 diagnoses (such as Parkinson disease, frontotemporal dementia, Pick disease), and those with incomplete demographic data (race and/or highest education). The study participants' flow chart is shown in Fig 1.
Flow chart of study participants. Only diagnoses made after MR imaging were considered for this study. PD indicates Parkinson disease; FTD, frontotemporal dementia.
Clinical Scenarios and Diagnoses
Requests for MR imaging with the dementia protocol were received from neurologists and primary care physicians within the institution's health care network and also from providers outside the network. These included requests from first-line memory clinics, doctor's offices, and tertiary referral centers. All patients were eventually reviewed by board-certified neurologists with additional subspecialty certification in Behavioral Neurology and Neuropsychiatry from the United Council for Neurologic Subspecialties. The final diagnosis was based on a comprehensive neurologic examination, detailed cognitive testing, review of laboratory tests and brain MR imaging scans, and additional CSF biomarker testing as appropriate. The diagnostic guidelines used are consistent with the 2011 National Institute on Aging–Alzheimer's Association revised diagnostic criteria for AD.8 In the selection of patients, we considered only those diagnoses made postscanning to be valid for our research.
Imaging Acquisition and Analysis
Brain MR imaging was performed on one of several 1.5T or 3T MR imaging scanners using our institutional dementia MR imaging protocol. All examinations included a volumetric T1-weighted sequence with 1-mm isotropic voxels in addition to standard brain MR imaging sequences. The exact sequence parameters varied among scanners due to scanner differences, but a representative MPRAGE sequence was performed with 1-mm section thickness, 1-mm in-plane resolution, FOV = 256 × 256 mm, TR = 2.3 seconds, TE = 3.17 ms, bandwidth = 210 Hz. Images were acquired in the sagittal plane and reformatted to the axial and coronal planes. Assessment of MTA and ERiCA scores was performed on coronal images by the interpreting board-certified and Certificate of Added Qualification–certified neuroradiologists according to published guides.6,7 To summarize, the MTA score is a quantification of temporal atrophy using the hippocampal height and width of neighboring CSF space, with scores ranging from 0 (no atrophy) to 4 (severe atrophy).6 The ERiCA score is a quantification of entorhinal cortex atrophy assessed at the level of the mammillary bodies, with scores ranging from 0 (normal) to 3 (severe atrophy).7 Quantitative assessment was performed using NeuroQuant, which performed volumetric segmentation on the volumetric T1-weighted series and output quantitative volumes and ratios to the PACS. All imaging analyses were performed at the time of clinical interpretation, and no post hoc analysis was performed, to simulate a realistic clinical environment. Figure 2 shows examples of MR images obtained from NeuroQuant for patients with AD, MCI, and SCD.
Coronal T1 noncontrast MR images with automated segmentation overlay obtained from NeuroQuant showing hippocampal volumes (yellow arrow) for patients with AD (A), mild cognitive impairment (B), and SCD (C). Note the marked hippocampal atrophy for AD compared with MCI and SCD.
Outcome and Predictor Variables
The outcome variables used were AD, MCI, and SCD. Normalized total hippocampal volume (NTHV), maximum MTA scores, and maximum ERiCA scores were used as predictor variables. NTHV was calculated by assessing the right and left hippocampal volumes generated separately by NeuroQuant, summing the 2 values, and expressing the resultant value as a percentage of each patient's intracranial volume [(right hippocampal volume + left hippocampal volume)/ICV]. Values generated by NeuroQuant are normalized to an age-specific cohort. MTA and ERiCA scores were assigned to each temporal lobe separately by the interpreting radiologists at the time of clinical interpretation using a structured visual scoring routine. Scores were obtained retrospectively from the radiology report. The higher value of the right and left MTA or ERiCA scores, which we termed maximum MTA or maximum ERiCA, was used for analysis.
Covariates
Covariates were sex, race, highest education attained, age at diagnosis, a history of arterial hypertension, and a history of diabetes mellitus.
Statistical Analysis
The χ2 test was used to determine associations between categoric variables and the disease outcomes. The Kruskal-Wallis test was used to determine associations between continuous variables and disease outcomes because observations in these variables assumed a non-normal distribution based on the Shapiro-Wilk test for normality. Differences between comparison groups (AD versus MCI, AD versus SCD, and MCI versus SCD) were calculated in a post hoc analysis. Linear correlational analysis of quantitative and qualitative measures was assessed using the Spearman correlation. Multivariate logistic regression analyses were performed with a stepwise addition of predictors. Diagnostic performances of predictor variables were assessed by receiver operating characteristic (ROC) analyses, and the cutoff value for each parameter was calculated using the maximum Youden index. An α level of .05 was set as the level of significance. SAS software, Version 9.4 (SAS Institute, Cary, North Carolina) was used for all analyses.
RESULTS
A total of 328 imaging studies were evaluated (118 patients with AD: median age, 78 years; interquartile range [IQR], 71–84 years; 172 patients with MCI: median age, 74 years; IQR, 69–78 years; and 38 patients with SCD: median age, 67 years; IQR, 54–78 years). The time from the MR imaging dementia protocol scan to the date of the ICD-10 diagnosis varied among the 3 groups (AD group: median time, 48 days; IQR, 15–98 days; MCI group: median time, 6 days; IQR, 3–62 days; and SCD group: median time, 20 days; IQR, 3–171 days).
Descriptive analyses indicated that only age at diagnosis, NTHV, MTA, and ERiCA scores were significantly different across the 3 outcome variables (Table 1 and Fig 3). In a post hoc analysis, patients with AD were older (median age, 78 years), had lower NTHVs (median, 0.35%), higher MTA score (median, 3), and higher ERiCA scores (median, 2) than those with SCD (P < .001) and MCI (P < .001).
Boxplot graphs of the variables age at diagnosis (A), NTHV (B), maximum MTA (C), and maximum ERiCA (D) scores against the outcome variables AD, MCI, and SCD. The maximum and minimum values are represented at either end of the whiskers. The box represents the interquartile range (25th percentile to the 75th percentile), with the median represented by the line within the box, and the mean shown by the diamond. Outliers are shown by a circle. Maximum MTA and maximum ERiCA refer to the higher score of the right and left values of the respective parameters.
Descriptive statisticsa
There was a fairly strong positive linear correlation between MTA and ERICA (r = 0.82), and a moderate negative linear correlation between NTHV and the visual scales (MTA and ERiCA). The correlation between MTA and NTHV (r = −0.76) was found to be significantly higher than that between ERiCA and NTHV (r = −0.71) (P = .02).
In multivariate logistic regression analyses, when we controlled for age at diagnosis, sex, race, highest educational level, history of arterial hypertension, and diabetes mellitus, only ERiCA remained as a significant predictor (P =.02) in discriminating AD from SCD. Regarding the differentiation of AD from MCI, NTHV was the only significant predictor (P = <.001) in the adjusted model. In the discrimination of MCI from SCD, none of the predictors remained significant (Online Supplemental Data).
In the ROC analyses, the model containing ERiCA, MTA, NTHV, and age at diagnosis was used. All 3 predictors (NTHV, MTA, and ERiCA) were significantly better than chance in the discrimination of AD from SCD. Area under the curve values for each were the following: NTHV (0.83), ERiCA (0.83), and MTA (0.80) (Fig 4 and Online Supplemental Data). There was no statistically significant difference between NTHV versus MTA (P = .37), ERiCA versus MTA (P = .28), and NTHV versus ERiCA (P = .90). Regarding the differentiation of AD from MCI, all 3 predictors were significantly better than chance. Area under the curve values for each were the following: NTHV (0.75), ERiCA (0.71), and MTA (0.73). There was no statistically significant difference between NTHV versus MTA (P = .47), ERiCA versus MTA (P = .25), and NTHV versus ERiCA (P = .12) (Fig 5 and Online Supplemental Data). When MCI was compared with SCD, the confidence intervals of all 3 predictors included the value 0.5, implying that all 3 parameters are poor discriminators (Fig 6 and Online Supplemental Data).
ROC curves for AD versus SCD. The overall model includes normalized total hippocampal volume, maximum MTA, maximum ERiCA, and age at diagnosis. Maximum MTA and maximum ERiCA refer to the higher score of the right and left values of the respective parameters.
ROC curves for AD versus MCI. The overall model includes normalized total hippocampal volume, maximum MTA, maximum ERiCA, and age at diagnosis. Maximum MTA and maximum ERiCA refer to the higher score of the right and left values of the respective parameters.
ROC curves for MCI versus SCD. The overall model includes NTHV, maximum MTA, maximum ERiCA, and age at diagnosis. Maximum MTA and maximum ERiCA refer to the higher scores of the right and left values of the respective parameters.
In the discrimination of AD from SCD, an ERiCA cutoff value of 2 had a lower sensitivity (69%) compared with NTHV (84%) and MTA (84%) but had a higher specificity (87%) than NTHV (74%) and MTA (66%) (Table 2). When AD was compared with MCI, an ERiCA cutoff value of 3 had a sensitivity (69%) similar to that of NTHV (71%) but lower than that of MTA (84%); the ERiCA value had a specificity (66%) similar to that of NTHV (67%) but was higher than that of MTA (54%) (Table 3).
Predictor diagnostic performances—AD versus SCDa
Predictor diagnostic performances—AD versus MCIa
DISCUSSION
Our study compared the diagnostic performance of quantitative measurement (derived from automated volumetric assessment) and visual inspection scales in the discrimination of AD from SCD and MCI. The quantitative measure used was NTHV, while MTA and ERiCA scores were used as visual-inspection parameters. The interrater reliability analyses of the visual inspection parameters have been previously studied and have been shown to be good.7,9
As expected, the correlation coefficient between MTA and NTHV was significantly higher than that between ERiCA and NTHV. This result is because MTA is a measure of hippocampal atrophy, whereas ERiCA is a measure of entorhinal cortical atrophy and not a direct assessment of hippocampal volume.6,8,10 This finding also explains why the OR of MTA in the prediction model differentiating AD from MCI loses statistical significance when NTHV is introduced into the model but is maintained when only ERiCA as a predictor variable is introduced in addition to other covariates. Statistically significant higher ORs were seen for ERiCA (OR, ≥2.0) in the unadjusted and adjusted AD-versus-SCD prediction models, and a statistically significant OR (OR, 1.7) was demonstrated for MTA in the unadjusted AD-versus-MCI prediction model (MTA and ERiCA as predictors). Even though hippocampal atrophy (either by volume or MTA) is a hallmark of AD that has been demonstrated in multiple prior studies7,11⇓⇓-14 and also confirmed in our study, it is not as predictive as it should have been in the AD-versus-SCD group, but rather discriminative when the difference in atrophy was expected to be lower (AD versus MCI group). These overlapping results potentially help to re-emphasize the clinical relevance that SCD, MCI, and AD are not mutually exclusive states but exist with overlaps in a cognitive continuum. The entorhinal cortex has been identified as a distinct early marker of AD pathology before significant hippocampal atrophy,7,15⇓-17 and this can be demonstrated in the ERiCA results obtained in the AD-versus-SCD models.
An ERiCA cutoff value of 2 produced a sensitivity of 69% and a specificity of 87%, which are lower than those reported by Enkirch et al.7 Our results do, however, show the higher specificity of ERiCA compared with MTA in discriminating AD and SCD, supportive of their findings. The lower diagnostic performance of ERiCA in our study reflects what can be expected in a larger cohort with clinical interpretation from multiple reporting neuroradiologists in a routine setting instead of dedicated research reviewers. The sensitivity and specificity of MTA scores in discriminating AD from controls varies across studies, with sensitivity ranges from 57% to 100% and specificity ranges from 67% to 100%.6,7,12,18⇓-20 In our study, we calculated the sensitivity and specificity of the MTA cutoff point for all ages (MTA, 2) in discriminating AD from SCD to be 84% and 66%, respectively.
In discriminating AD and MCI, our study produced an area under the curve value of 0.71, which is the same as that calculated by Traschütz et al,21 and ERiCA also showed higher specificity but lower sensitivity in the differentiation of AD from MCI as estimated by Roberge et al.22 Even though the ROC analyses indicate that all 3 diagnostic tests (NTHV, MTA, and ERiCA) are better than chance in differentiating AD from MCI, their sensitivity and specificity values as shown in the results indicate challenges in using only these parameters in the diagnosis of dementia.
Our study had limitations, including the retrospective study design and the small sample size of the SCD group. Also, the SCD group is not truly a healthy group because these individuals presented to the clinic with varied cognitive symptoms for which they underwent the MR imaging dementia protocol scan. Because our study relied on NeuroQuant, which we use only in the context of a referral for assessment of some form of cognitive impairment, we did not have NeuroQuant data on true healthy patients. Nevertheless, the SCD group may serve as a useful comparison group because it reflects a realistic sampling of the patient population who undergo the MR imaging dementia protocol. In addition, even though the brain MR imaging scan result is just one component used in the diagnosis of dementia, its impression could still possibly bias the final definitive diagnosis. Last, even though the NeuroQuant analyses were conducted at the time of performing the dementia scan protocol, interpretating neuroradiologists scored MTA and ERiCA on the basis of standardized criteria and a visual guide, independent of the NeuroQuant analyses and as per our institutional practice. However, because the reporting neuroradiologists were not completely blinded to the NeuroQuant analyses, the reported visual assessment scores could potentially have been influenced by the NeuroQuant results.
CONCLUSIONS
The structured scoring scale, ERiCA, had lower sensitivity but higher specificity compared with NTHV and MTA in the discrimination of AD from SCD. In a multivariable model, only ERiCA remained as an independent predictor. Our results support the inclusion of the ERiCA score in the radiologic assessment of MR imaging performed for suspected dementia, in addition to the more widely used MTA score because these 2 scores when used together may be helpful supplements to the evaluation of dementia, especially if automated quantitative analysis is not available.
Footnotes
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- Received May 2, 2023.
- Accepted after revision October 4, 2023.
- © 2023 by American Journal of Neuroradiology