As the use of machine learning algorithms in the development of clinical prediction models has increased, researchers are becoming more aware of the deleterious that stem from the lack of reporting standards. One of the most obvious consequences is the insufficient reproducibility found in current prediction models. In an attempt to characterize methods to improve reproducibility and to allow for better clinical performance, we utilize a previously proposed taxonomy that separates reproducibility into three components: technical, statistical, and conceptual reproducibility. By following this framework, we discuss common errors that lead to poor reproducibility, highlight the importance of generalizability when evaluating a ML model's performance, and provide suggestions to optimize generalizability to ensure adequate performance. These efforts are a necessity before such models are applied to patient care.
View details for DOI 10.1016/j.spinee.2020.10.006
View details for PubMedID 33065274