New to MyHealth?
Manage Your Care From Anywhere.
Access your health information from any device with MyHealth. You can message your clinic, view lab results, schedule an appointment, and pay your bill.
ALREADY HAVE AN ACCESS CODE?
DON'T HAVE AN ACCESS CODE?
NEED MORE DETAILS?
MyHealth for Mobile
Can large language models address unmet patient information needs and reduce provider burnout in the management of thyroid disease?
Can large language models address unmet patient information needs and reduce provider burnout in the management of thyroid disease? Surgery Raghunathan, R., Jacobs, A. R., Sant, V. R., King, L. J., Rothberger, G., Prescott, J., Allendorf, J., Seib, C. D., Patel, K. N., Suh, I. 2024Abstract
BACKGROUND: Patient electronic messaging has increased clinician workload contributing to burnout. Large language models can respond to these patient queries, but no studies exist on large language model responses in thyroid disease.METHODS: This cross-sectional study randomly selected 33 of 52 patient questions found on Reddit/askdocs. Questions were found through a "thyroid+cancer" or "thyroid+disease" search and had verified-physician responses. Additional responses were generated using ChatGPT-3.5 and GPT-4. Questions and responses were anonymized and graded for accuracy, quality, and empathy using a 4-point Likert scale by blinded providers, including 4 surgeons, 1 endocrinologist, and 2 physician assistants (n= 7). Results were analyzed using a single-factor analysis of variance.RESULTS: For accuracy, the results averaged 2.71/4 (standard deviation 1.04), 3.49/4 (0.391), and 3.66/4 (0.286) for physicians, GPT-3.5, and GPT-4, respectively (P < .01), where 4= completely true information, 3= greater than 50% true information, and 2= less than 50% true information. For quality, the results were 2.37/4 (standard deviation 0.661), 2.98/4 (0.352), and 3.81/4 (0.36) for physicians, GPT-3.5, and GPT-4, respectively (P < .01), where 4= provided information beyond what was asked, 3= completely answers the question, and 2= partially answers the question. For empathy, the mean scores were 2.37/4 (standard deviation 0.661), 2.80/4 (0.582), and 3.14/4 (0.578) for physicians, GPT-3.5, and GPT-4, respectively (P < .01), where 4= anticipates and infers patient feelings from the expressed question, 3= mirrors the patient's feelings, and 2= contains no dismissive comments. Responses by GPT were ranked first 95% of the time.CONCLUSIONS: Large language model responses to patient queries about thyroid disease have the potential to be more accurate, complete, empathetic, and consistent than physician responses.
View details for DOI 10.1016/j.surg.2024.06.075
View details for PubMedID 39424485