Abstract
BACKGROUND AND PURPOSE: MR imaging has a key role in predicting neurodevelopmental outcomes following neonatal hypoxic-ischemic encephalopathy (HIE). A novel MR imaging scoring system for hypoxic-ischemic brain injury was used in our patient population with the aim of assessing interobserver variability and developing subcategories for the severity of brain injury.
MATERIALS AND METHODS: We evaluated brain MR images of 252 infants who underwent hypothermia for HIE between 2014 and 2019. First, 40 infants were selected randomly to test interobserver variability. Discrepancies were identified during the assessment of the first 20 MR images. The remaining 20 MR images were scored after adjusting the scoring system. Second, we determined cutoff values for the severity of injury that were based on the percentiles of the total scores in the full cohort.
RESULTS: The interobserver reliability showed excellent agreement for the total score both before (intraclass correlation coefficient = 0.96; 95% CI 0.89–0.99) and after the adjustment (intraclass correlation coefficient = 0.96; 95% CI, 0.89–0.98). The average of the differences and the agreement interval between the 2 readers decreased after the adjustment. Subcategories of brain injury were the following: We considered a total score of ≤4 (≤75%) as normal, 5–10 (76%–90%) as mild, 11–15 (91%–95%) as moderate, and >15 (>95%) as severe brain injury. The agreement on the classification of brain injury improved in the second epoch (weighted κ = 0.723 versus 0.887).
CONCLUSIONS: The adjusted scoring system may lead to a higher degree of interrater agreement. The presented cutoff values may be used to determine the severity of brain injury in future clinical studies including infants with mild hypoxia-ischemia.
ABBREVIATIONS:
- HIE
- hypoxic-ischemic encephalopathy
- ICC
- intraclass correlation coefficient
- κW
- weighted κ
- PLIC
- posterior limb of internal capsule
- TH
- therapeutic hypothermia
Hypoxic-ischemic encephalopathy (HIE) occurs in 2–3 per 1000 live term births in developed countries.1 To date, therapeutic hypothermia (TH) initiated within the first 6 hours of life and continued for 72 hours with a target central temperature of 33.5°C is the only available treatment to reduce the risk of death and neurodevelopmental impairment.2,3
The ability to predict neurodevelopmental outcomes following HIE allows parents and caregivers to optimize care beyond the neonatal period. MR imaging has a key role in predicting neurologic outcomes.4,5 Although many previously reported MR imaging scoring systems have been related to outcome,6⇓-8 they were usually performed with conventional sequences. The widely used scoring system of Barkovich et al6 published before the hypothermic era did not originally incorporate diffusion-weighted images, even though DWI has been recognized as the most reliable MR imaging sequence to assess injury during the first week after an hypoxic-ischemic event.4,9 Recently, Weeke et al10 described a novel and more detailed MR imaging scoring system for term infants with HIE, incorporating DWI and 1H-MR spectroscopy sequences as well patterns of injury to the gray matter, white matter, and cerebellum to improve the predictive value of MR imaging studies in infants with HIE. The gray matter subscore was an independent predictor of adverse outcome at 2 years of age and at school age.10
Given that our inclusion criteria for infants to undergo TH at Brigham and Women's Hospital had been broadened, offering cooling to milder cases and infants born at >34 weeks of gestation, we wished to explore the application of this new scoring system in our TH cohort.
We applied the new scoring system to our diverse patient population with the aim of assessing the observer variability between 2 experienced readers. We identified discrepancies during the evaluation of the first 20 MR imaging scans and adjusted the scoring system of Weeke et al10 accordingly.
Second, we also aimed to develop subcategories of severity from the scores of normal brain and, mild, moderate, severe brain injury. Our hypothesis was that the adjusted scoring system can improve interobserver reliability and increase the ease and reliability of the application of this scoring system as a new standard in the documentation of cerebral injury in the setting of hypoxic-ischemic encephalopathy.
MATERIALS AND METHODS
Patients
We have collected data, including imaging data, on 252 infants who underwent TH for neonatal encephalopathy between January 2014 and May 2019. We randomly selected 40 infants to assess the observer variability of the new MR imaging scoring system.10 This retrospective observational study was conducted at Brigham and Women's Hospital, Department of Pediatric Newborn Medicine, a single, tertiary-level neonatal intensive care unit. Institutional review board approval was obtained with a waiver of consent. The criteria for TH in our center are modified regional center–based criteria in which variables have been broadened from those used in the randomized clinical trials.11,12 The adaptations have included the following: 1) decreasing the gestational age criteria to >34 weeks; 2) increasing the inclusion pH from ≤7.0 to ≤7.1; 3) reducing the base excess for inclusion from ≥16 mEq/L to ≥10 mEq/L; and 4) providing therapeutic hypothermia to infants with mild hypoxic-ischemic encephalopathy on clinical examination, in addition to those with moderate or severe HIE. The stage of HIE was assigned on the basis of the modified Sarnat staging system after combined assessment by clinicians before the initiation of TH.13
MR Imaging
All infants underwent at least 1 cerebral MR imaging performed after TH within the first week of life. The second MR imaging was based on the decision of the clinical team caring for the infant. Only the first MR images obtained within the first week of life were analyzed in this study. All scans were performed on a 3T Siemens scanner (Siemens, Erlangen, Germany). The standard clinical imaging protocol included sagittal motion-corrected magnetization-prepared rapid acquisition of gradient echo T1-weighted images (TR = 2800 ms; TE = 2.75, 4.68, 6.54, and 8.4 ms; flip angle = 7°; voxel size = 1 × 1 × 1 mm), axial turbo spin-echo T1-weighted images (TR = 574 ms, TE = 13 ms, flip angle = 140°, voxel size = 0.5 × 0.5 × 3 mm, echo-train length = 2), axial turbo spin-echo T2-weighted images (TR = 9000 ms, TE = 150 ms, flip angle = 120°, voxel size = 0.5 × 0.5 × 3 mm, echo-train length = 19), and coronal turbo spin-echo T2-weighted images (TR = 9210 ms, TE = 187 ms, flip angle = 130°, voxel size = 0.4 × 0.4 × 3 mm, echo-train length = 19). Diffusion-weighted imaging included multidirectional diffusion-weighted measurements (TR = 6200 ms, TE = 92 ms, bandwidth = 1984 Hz/Px, FOV = 140 mm, voxels = 2 × 2 × 2 mm, 30 b-directions with amplitudes ranging from 0 to 1000 s/mm2). 1H-MR spectroscopy measurements were acquired at TE = 44 and 288 ms in the left thalamus and basal ganglia.
For noncritically ill neonates, we used a “feed and wrap” protocol, which is based on the timing of feeds, induction of natural sleep, and immobilization with wrapping to avoid the need for anesthetic agents.14 MR imaging scans with motion artifacts were excluded from the interrater analysis.
The pattern of brain injury was evaluated according to the novel grading system.10
The total score of the grading system is 57, including gray matter (maximum GM subscore = 25), white matter (maximum WM subscore = 21), cerebellum (maximum cerebellum subscore = 8), and an additional subscore (maximum additional subscore = 3). The additional score describes the presence of intraventricular or subdural hemorrhage and sinovenous thrombosis. The score of 1H-MR spectroscopy was included in the gray matter subscore.10
The MR images of the first 20 neonates were evaluated on the basis of the description of the original article. We identified discrepancies during the evaluation of the first 20 MR imaging scans and adjusted the scoring system accordingly. A further series of 20 MR images was scored after adjustment of the scoring system.
The adjustments were the following: 1) The gestational age of the infants was taken into consideration when evaluating the myelination of the posterior limb of the internal capsule (PLIC) and the peak of the NAA (Fig 1A, -B); 2) a lesion that had involvement of both WM and the cortex was scored only individually for the principal area injured (Fig 1C); and 3) the extension of signal abnormality (involving 1 lobe or >1 lobe) was scored on the basis of the primary area of injury (Fig 1D). The images were analyzed by a pediatric neuroradiologist (E.Y.) and a dual-board-certified neonatologist and child neurologist (T.E.I.), who were blinded to the stage of neonatal encephalopathy.
A, The gestational age of the infants considered when evaluating the myelination of the PLIC. In the coronal T1-weighted image, the myelination of the PLIC was considered as age-appropriate for a near-term infant (35 weeks of gestation) and was scored as normal. B, The gestational age of the infants was considered when evaluating the peak of the NAA. In 1H-MR spectroscopy (TE = 30 ms), the peak NAA was considered as age-appropriate for a near-term infant (35 weeks of gestation) and was scored as normal. C, The lesion that had involvement of both the WM and cortex was scored individually only for the principal area. In axial DWI, the diffusion restriction in the cortex and its location were scored as focal (1 lobe) and unilateral (score of 2). The WM involvement was scored individually as focal and unilateral (score of 2). D, The extension of signal abnormality (involving 1 lobe or >1 lobe) was scored on the basis of the primary area of injury. In axial ADC mapping, the diffusion restriction in the WM was scored as focal (score of 1) because only the frontal lobe was involved and the location was scored as bilateral (score of 2).
Interrater agreement was assessed by total score, subscores, and the severity of brain injury (normal, mild, moderate, and severe) before (n = 20) and after the adjustment of the scoring system (n = 20).
Statistical Analysis
Interrater reliability was evaluated by calculation of the intraclass correlation coefficient (ICC) with a 2-way random-effects model for total score and WM, GM, cerebellum, and additional subscores. In addition, Bland-Altman plots were performed to assess the absolute limits of interobserver agreement for continuous variables.
The percentiles of the total score in the full cohort (n = 252) were calculated to determine the cutoff values for normal brain and mild, moderate, and severe brain injury. Weighted κ (κW) tests were used to determine the agreement between the readers for the severity of brain injury as a categorical variable. The McNemar test was run to determine whether there was a difference in the severity of brain injury as categoric variables between the readers. We used SPSS, Version 22 (IBM) and GraphPad Prism, Version 8.1.2 for macOS (GraphPad Software) to analyze and plot the data.
RESULTS
The demographic and prenatal data of the total cohort are presented in Table 1. Fifty-three percent of the infants had mild HIE based on the modified Sarnat staging system, reflecting our institutional policy offering cooling to milder cases. The brain MR imaging scans of the 252 infants were performed at a median of 4.0 (interquartile range = 3.0–4.0) days of life. The randomly selected 40 MR images were evaluated by 2 experienced readers, resulting in a total of 80 reads.
Demographics and prenatal data of the full cohorta
In the first epoch of the study, 20 MR images were scored by 2 readers on the basis of the description of the original article of Weeke et al.10 There was strong interrater agreement for the total score with an ICC of 0.96 (95% CI, 0.89–0.99). The ICC for subscores also showed an excellent agreement between the raters (Table 2).
Interrater reliability for subscores and for total scorea
In the Bland-Altman analysis, the average of the differences [SD] was 1.80 [3.7] for the total score with −5.5 to 9.1 limits of agreement. Regarding the subscores, there were no mean differences of >2 points (Online Supplemental Data).
The severity of brain injury was classified on the basis of the distribution of total scores in the full cohort including 252 infants. Figure 2 shows the frequency distribution of the total score in the full cohort. The median of the total score was 2, ranging between 0 and 41 points in the full cohort. Subcategories of brain injury were determined as follows: We considered total score ≤4 (≤ 75%) as normal, 5–10 (76–90%) as mild, 11–15 (91–95%) as moderate, and >15 (>95%) as severe brain injury.
The frequency distribution of the total score in the full cohort.
The 2 readers agreed that 10/20 (50%) MRI findings were within normal limits, findings of 2/20 (10%) scans were classified as moderate and those of 3/20 (15%) scans were graded as severe brain injury. However, the severity of brain injury in 5 infants (5/20, 25%) was graded differently by the 2 observers. Reader 1 classified findings of 3 MRIs (patients 1, 12, and 19) as moderate, whereas reader 2 rated them as mild (patients 1 and 19) or normal (patient 12). The severity of brain injury was classified differently by the 2 readers in patient 2 (mild versus normal) and in patient 6 (severe versus mild). The κW was run to determine whether there was agreement between 2 observers on the severity of brain injury, and it showed a substantial agreement (κW = 0.723) (Fig 3A and Table 3). Figure 3 shows the severity of brain injury based on the total score for each of the subjects.
The severity of brain injury based on the total score for each subject in the first (A) and second epochs (B).
Severity of injury based on the total scoresa
In the second epoch of the study, 20 MR imaging scans were evaluated by the same readers after the adjustment of the grading system. The adjustments were based on the main discrepancies between the 2 readers in the first epoch, including the assessment of myelination in the PLIC, the peak of the NAA level, cortical involvement, and the extent of the WM injury.
In the second epoch, the ICC for the total score and subscores also indicated an excellent reliability between the 2 readers, similar to that of the first epoch with the exception of an additional subscore (Table 2). Overall, both the average of the differences (bias) and the limits of agreement improved for the total score and the subscores (Online Supplemental Data).
In addition, only 3 MR images (15%) were classified differently by the 2 observers (Fig 3B). In line with this difference, the κW showed a very good agreement between the 2 readers' classifications on the severity of brain injury (κW = 0.887) compared with the substantial agreement in the first epoch (κW = 0.723). The McNemar test determined that the difference of the proportion in each category was not statistically different, similar to the finding in the first epoch (Table 3).
DISCUSSION
This study has demonstrated the utility of a novel MR imaging scoring system in a cohort of neonates with a wide range of HIE severity. It also showed the potential advantage of adjusting some of its subscores. The interrater reliability showed an excellent level of agreement for the total score between the 2 experienced readers both before and after the adjustment of the scoring system. The Bland-Altman plot revealed, overall, a decreasing bias between the 2 readers and a narrower agreement interval for the subscores after the adjustment. In addition, the agreement between the 2 readers' classifications on the severity of brain injury greatly improved in the second epoch.
The presented cutoff values may be used to determine the severity of brain injury in future clinical studies. However, the cutoff values derived from the percentiles of the total scores in the full cohort may reflect our diverse patient population, including infants with mild HIE. Hence, these cutoff values may not be applicable to centers that provide TH to infants with only moderate and severe HIE.
In recent years, the inclusion criteria for hypothermia have been broadened, and TH has been offered increasingly to near-term infants.15,16
In line with these criteria, the first adjustment related to the gestational age of infants with HIE. Both the metabolic profile and the myelination change as the brain matures. The rate of increase in the NAA peak is related to the maturation process.17 Likewise, an increase in myelinated WM can be detected between 35 and 41 weeks of gestation.18 Therefore, in the assessment of the NAA peak and the absence of myelination in the PLIC, the gestational age must be considered. The consistent evaluation of these 2 items is also important because both abnormal signal in the PLIC and NAA concentration have a good predictive value for the neurodevelopmental outcomes.19,20
The second and third adjustment included the involvement of WM and the cortical area. The retrospective study of Rao et al16 found that the WM injury was the most frequent pattern among near-term infants, followed by GM injury and cortex involvement. Furthermore, isolated WM and cortical abnormalities were associated with communication and behavioral problems, visual impairment, and seizures.21 Hence, the consistent evaluation of WM and cortex involvement has a major role in the prediction of long-term outcomes. Moreover, the inconsistent scoring of brain injury can change the category of severity.
The study has several limitations that should be taken into consideration. First, statistical analysis should be interpreted with caution within the context of the small sample size. Second, the 95% limits of agreement in the Bland-Altman plot due the small sample size may be unreliable for estimating larger populations. Another limitation of the study is that we did not validate our cutoff values against long-term neurodevelopmental outcome data.
CONCLUSIONS
The novel grading system developed by Weeke et el10 provides a detailed evaluation of the neonatal brain with hypoxic-ischemic injury using DWI and 1H-MR spectroscopy sequences. The modification of the scoring system may help with the correct interpretation of the selected items and can lead to a higher degree of interrater agreement. The presented cutoff values may be used to determine the severity of brain injury in future clinical studies, including those infants with mild HIE. Clearly, further studies are needed to determine the cutoff values of this novel grading system for the severity of brain injury in relation to neurodevelopmental sequelae.
Footnotes
Disclosures: Edward Yang—UNRELATED: Consultancy: CorticoMetrics, Comments: reviewed brain MRIs for a company developing software to recognize cortical dysplasia, last time in 2017. Terrie E. Inder—UNRELATED: Consultancy: Aspect Imaging, Comments: I am on the Scientific Advisory Board for this novel MR imaging company manufacturing a neonatal MRI system; Expert Testimony: occasional legal services, Comments: occasional medicolegal opinion; Grants/Grants Pending: federal and foundations.* *Money paid to the institution.
References
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- Received May 11, 2020.
- Accepted after revision November 23, 2020.
- © 2021 by American Journal of Neuroradiology