New to MyHealth?
Manage Your Care From Anywhere.
Access your health information from any device with MyHealth. You can message your clinic, view lab results, schedule an appointment, and pay your bill.
ALREADY HAVE AN ACCESS CODE?
DON'T HAVE AN ACCESS CODE?
NEED MORE DETAILS?
MyHealth for Mobile
Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features.
Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features. Journal of the American Medical Informatics Association : JAMIA Nateghi Haredasht, F., Lopez, I., Tate, S., Ashtari, P., Chan, M. M., Kulkarni, D., Chen, C. A., Vangala, M., Griffith, K., Bunning, B., Miner, A. S., Hernandez-Boussard, T., Humphreys, K., Lembke, A., Vance, L. A., Chen, J. H. 2025Abstract
OBJECTIVE: Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.MATERIALS AND METHODS: We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.RESULTS: XGBoost achieved the highest classification performance (ROC-AUC=0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.DISCUSSION: Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.CONCLUSION: Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.
View details for DOI 10.1093/jamia/ocaf157
View details for PubMedID 40977375