Performance Comparison of Individual and Ensemble CNN Models for the Classification of Brain 18F-FDG-PET Scans. Journal of digital imaging Nobashi, T. n., Zacharias, C. n., Ellis, J. K., Ferri, V. n., Koran, M. E., Franc, B. L., Iagaru, A. n., Davidzon, G. A. 2019

Abstract

The high-background glucose metabolism of normal gray matter on [18F]-fluoro-2-D-deoxyglucose (FDG) positron emission tomography (PET) of the brain results in a low signal-to-background ratio, potentially increasing the possibility of missing important findings in patients with intracranial malignancies. To explore the strategy of using a deep learning classifier to aid in distinguishing normal versus abnormal findings on PET brain images, this study evaluated the performance of a two-dimensional convolutional neural network (2D-CNN) to classify FDG PET brain scans as normal (N) or abnormal (A).Two hundred eighty-nine brain FDG-PET scans (N; n?=?150, A; n?=?139) resulting in a total of 68,260 images were included. Nine individual 2D-CNN models with three different window settings for axial, coronal, and sagittal axes were trained and validated. The performance of these individual and ensemble models was evaluated and compared using a test dataset. Odds ratio, Akaike's information criterion (AIC), and area under curve (AUC) on receiver-operative-characteristic curve, accuracy, and standard deviation (SD) were calculated.An optimal window setting to classify normal and abnormal scans was different for each axis of the individual models. An ensembled model using different axes with an optimized window setting (window-triad) showed better performance than ensembled models using the same axis and different windows settings (axis-triad). Increase in odds ratio and decrease in SD were observed in both axis-triad and window-triad models compared with individual models, whereas improvements of AUC and AIC were seen in window-triad models. An overall model averaging the probabilities of all individual models showed the best accuracy of 82.0%.Data ensemble using different window settings and axes was effective to improve 2D-CNN performance parameters for the classification of brain FDG-PET scans. If prospectively validated with a larger cohort of patients, similar models could provide decision support in a clinical setting.

View details for DOI 10.1007/s10278-019-00289-x

View details for PubMedID 31659587