Automatic detection of contouring errors using convolutional neural networks. Medical physics Rhee, D. J., Cardenas, C. E., Elhalawani, H., McCarroll, R., Zhang, L., Yang, J., Garden, A. S., Peterson, C. B., Beadle, B. M., Court, L. E. 2019


PURPOSE: To develop a head and neck normal structures auto-contouring tool that could be used to automatically detect the errors in auto-contours from a clinically-validated auto-contouring tool.METHODS: An auto-contouring tool based on convolutional neural networks (CNN) was developed for 16 normal structures of the head and neck and tested to identify the contour errors from a clinically-validated multi-atlas-based auto-contouring system (MACS). The CT scans and clinical contours from 3495 patients were semi-automatically curated and used to train and validate the CNN-based auto-contouring tool. The final accuracy of the tool was evaluated by calculating the Sorensen-Dice similarity coefficients (DSC) and Hausdorff distances between the automatically generated contours and physician-drawn contours on 174 internal and 24 external CT scans. Lastly, the CNN-based tool was evaluated on 60 patients' CT scans to investigate the possibility to detect contouring failures. The contouring failures on these patients were classified as either minor or major errors. The criteria to detect contouring errors were determined by analyzing the DSC between the CNN- and MACS-based contours under two independent scenarios: 1. contours with minor error are clinically acceptable and 2. contours with minor errors are clinically unacceptable.RESULTS: The average DSC and Hausdorff distance of our CNN-based tool were 98.4%/1.23cm for brain, 89.1%/0.42cm for eyes, 86.8%/1.28cm for mandible, 86.4%/0.88cm for brainstem, 83.4%/0.71cm for spinal cord, 82.7%/1.37cm for parotids, 80.7%/1.08cm for esophagus, 71.7%/0.39cm for lenses, 68.6%/0.72 for optic nerves, 66.4%/0.46cm for cochleas, and 40.7%/0.96cm for optic chiasm. With the error detection tool, the proportions of the clinically unacceptable MACS contours that were correctly detected were 0.99/0.80 on average except for the optic chiasm, when contours with minor errors are clinically acceptable/unacceptable respectively. The proportions of the clinically acceptable MACS contours that were correctly detected were 0.81/0.60 on average except for the optic chiasm, when contours with minor errors are clinically acceptable/unacceptable respectively.CONCLUSION: Our CNN-based auto-contouring tool performed well on both the publically-available and the internal datasets. Furthermore, our results show that CNN-based algorithms are able to identify ill-defined contours from a clinically-validated and used multi-atlas-based auto-contouring tool. Therefore, our CNN-based tool can effectively perform automatic verification of MACS contours.

View details for DOI 10.1002/mp.13814

View details for PubMedID 31505046