Cancer stage is rarely captured in structured form in the electronic health record (EHR). We evaluate the performance of a classifier, trained on structured EHR data, in identifying prostate cancer patients with metastatic disease. Using EHR data for a cohort of 5,861 prostate cancer patients mapped to the Observational Health Data Sciences and Informatics (OHDSI) data model, we constructed feature vectors containing frequency counts of conditions, procedures, medications, observations and laboratory values. Staging information from the California Cancer Registry was used as the ground-truth. For identifying patients with metastatic disease, a random forest model achieved precision and recall of 0.90, 0.40 using data within 12 months of diagnosis. This compared to precision 0.33, recall 0.54 for an ICD code-based query. High-precision classifiers using hundreds of structured data elements significantly outperform ICD queries, and may assist in identifying cohorts for observational research or clinical trial matching.
View details for PubMedID 30815195