Purpose: To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance to that of musculoskeletal radiologists.Materials and Methods: Radiographs from the Osteoarthritis Initiative staged by a radiologist committee using the Kellgren-Lawrence (KL) system were used. Before using the images as input to a convolutional neural network model, they were standardized and augmented automatically. The model was trained with 32116 images, tuned with 4074 images, evaluated with a 4090-image test set, and compared to two individual radiologists using a 50-image test subset. Saliency maps were generated to reveal features used by the model to determine KL grades.Results: With committee scores used as ground truth, the model had an average F1 score of 0.70 and an accuracy of 0.71 for the full test set. For the 50-image subset, the best individual radiologist had an average F1 score of 0.60 and an accuracy of 0.60; the model had an average F1 score of 0.64 and an accuracy of 0.66. Cohen weighted kappa between the committee and model was 0.86, comparable to intraexpert repeatability. Saliency maps identified sites of osteophyte formation as influential to predictions.Conclusion: An end-to-end interpretable model that takes full radiographs as input and predicts KL scores with state-of-the-art accuracy, performs as well as musculoskeletal radiologists, and does not require manual image preprocessing was developed. Saliency maps suggest the model's predictions were based on clinically relevant information. Supplemental material is available for this article. © RSNA, 2020.
View details for DOI 10.1148/ryai.2020190065
View details for PubMedID 32280948