Crowd Sourced Assessment of Ureteroscopy with Laser Lithotripsy video feed does not correlate with Trainee Experience. Journal of endourology Conti, S. L., Brubaker, W., Chung, B. I., Sofer, M., Hsi, R. S., Shinghal, R., Elliott, C. S., Caruso, T., Leppert, J. T. 2018


OBJECTIVES: We sought to validate the use of crowd sourced surgical video evaluation in the evaluation of flexible ureteroscopic laser lithotripsy videos using a modified global assessment scale previously validated for ureteroscopic skills.METHODS: We collected video feeds from 30 intra-renal ureteroscopic laser lithotripsy cases where residents post graduate year(PGY) 2 through 6 handled the ureteroscope. The video feeds were annotated to represent the overall performance and to contain the parts of the procedure being scored. The videos were submitted to a commercially available surgical video evaluation platform. We used a validated ureteroscopic laser lithotripsy global assessment tool that was modified to account for the fact that this scoring system looked at the video feed only. Videos were evaluated by crowd workers recruited using Amazon's Mechanical Turk as well as 5 Endourology trained experts. Mean scores were calculated and intraclass correlation coefficients(ICCs) were computed for the expert domain and total scores. The ICCs were estimated using a linear mixed-effects model. Spearman rank correlation coefficients were calculated as a measure of the strength of the relationships between the crowd mean and the expert average scores.RESULTS: 30 videos were reviewed 2,488 times by 487 crowd workers and five expert endourologists. ICCs between expert raters were all below accepted levels of correlation(0.30) with the overall score having an ICC of .000. Overall the crowd scores did not correlate with expert scores except for the stone retrieval domain (0.60 p = 0.015). Crowd sourced scores had a negative correlation with PGY level(-0.44 p=0.019).CONCLUSIONS: Given the poor agreement between experts and poor correlation between expert and crowd scores when evaluating video feeds of ureteroscopic laser lithotripsy, assessment of skills using intraoperative video feeds may not be reliable. This is further supported by the inverse correlation between crowd scores and PGY level.

