Optimizing Hepatitis B Virus Screening in the United States Using a Simple Demographics-Based Model. Hepatology (Baltimore, Md.) Ramrakhiani, N. S., Chen, V. L., Le, M., Yeo, Y. H., Barnett, S. D., Waljee, A. K., Zhu, J., Nguyen, M. H. 2021

Abstract

Chronic hepatitis B (CHB) affects over 290 million people globally and only 10% have been diagnosed, presenting a severe gap that must be addressed. We developed logistic regression and machine learning (random forest) models to accurately identify patients with HBV, using only easily-obtained demographic data from a population-based data set.We identified participants with data on hepatitis B surface antigen (HBsAg), birth year, sex, race/ethnicity, and birthplace from 10 cycles of the National Health and Nutrition Examination Survey (NHANES, 1999-2018) and divided them into two cohorts: training (cycles 2, 3, 5, 6, 8, 10; n = 39,119) and validation (cycles 1, 4, 7, 9; n = 21,569). We then developed and tested our two models. The overall cohort was 49.2% male, 39.7% White, 23.2% Black, 29.6% Hispanic, and 7.5% Asian/Other, with a median birth year of 1973. In multivariable logistic regression, the following factors were associated with HBV infection: birth year 1991 or after (adjusted OR [aOR] of 0.28, P < 0.001), male sex (aOR 1.49, P = 0.0080), Black and Asian/Other vs. White (aOR 5.23 and 9.13, P < 0.001 for both), and being United States-born (vs. foreign-born) (aOR 0.14, P < 0.001). We found that the machine learning model consistently outperformed the logistic regression model, with higher AUROC values (0.83 vs. 0.75 in validation cohort, P < 0.001) and better differentiation of high and low risk individuals.Our machine learning model provides a simple, targeted approach to HBV screening, using only easily-obtained demographic data.

View details for DOI 10.1002/hep.32142

View details for PubMedID 34496066