Development of a natural language processing system for extracting rheumatoid arthritis outcomes from clinical notes using the national RISE registry. Arthritis care & research Humbert-Droz, M., Izadi, Z., Schmajuk, G., Gianfrancesco, M., Baker, M. C., Yazdany, J., Tamang, S. 2022

Abstract

OBJECTIVE: To accelerate the use of outcome measures in rheumatology, we developed and evaluated a natural language processing (NLP) pipeline for extracting these measures from free-text outpatient rheumatology notes within the ACR's Rheumatology Informatics System for Effectiveness (RISE) registry.METHODS: We included all patients in RISE (2015 to 2018). The NLP pipeline extracted scores corresponding to eight measures of RA disease activity (DA) and functional status (FS) documented in outpatient rheumatology notes. Score extraction performance was evaluated by chart review, and we assessed agreement with scores documented in structured data. We conducted an external validation of our NLP pipeline using data from rheumatology notes from an academic medical center that is not included in the RISE registry.RESULTS: We processed over 34 million notes from 854,628 patients, 158 practices, and 24 EHR systems from RISE. Manual chart review revealed a sensitivity, positive predictive value (PPV), and F1 score of 95%, 87%, and 91%, respectively. Substantial agreement was observed between scores extracted from RISE notes and scores derived from structured data (kappa: 0.43 - 0.68 among DA and 0.86-0.98 among FS measures). Inthe external validation, we found a sensitivity, PPV, and F1 score of 92%, 69%, and 79%, respectively.CONCLUSIONS: We developed an NLP pipeline to extract RA outcome measures from a national registry of notes from multiple EHR systems and found it to have good internal and external validity. This pipeline can facilitate measurement of clinical and patient reported outcomes for use in research and quality measurement.

View details for DOI 10.1002/acr.24869

View details for PubMedID 35157365