Development and Validation of the Stanford Obstetric Recovery Checklist (STORK): A Delphi Consensus and Multicenter Clinical Validation Study.
Development and Validation of the Stanford Obstetric Recovery Checklist (STORK): A Delphi Consensus and Multicenter Clinical Validation Study. JAMA network open 2025; 8 (4): e255713Abstract
Existing patient-reported outcome measures (PROMs) evaluating outpatient postpartum recovery lack content validity and were mostly not designed for this population. A Delphi process was performed, aiming to develop a patient-reported outcome measure for outpatient postpartum recovery and then evaluate it in a multicenter cohort study.Development of the Stanford Obstetric Recovery Checklist (STORK) involved 3 phases: (1) postpartum recovery questions were identified in published reviews; (2) after institutional review board approval, 16 multidisciplinary experts and patient stakeholders participated in 3 Delphi rounds (January 11 to April 12, 2021) to select items, resulting in the development of STORK (47 items; total score range, 0-188, with 0 indicating the worst recovery and 188 indicating the best recovery); and (3) cognitive debriefing interviews were conducted with 10 postpartum individuals to finalize STORK items. Individuals then completed STORK during their inpatient stay and at 2, 6, and 12 weeks post partum in a prospective, 3-center, US longitudinal cohort study conducted from June 13, 2022, to February 28, 2023. Recruitment occurred until 300 six-week STORK surveys were completed. STORK was evaluated at 6 weeks for validity (ability to measure recovery), reliability, and responsiveness. Validity included (1) structural validity (exploratory factor analysis using root mean square residual [RMSR]; <0.08 indicates a good fit); (2) convergent validity (correlation with global health visual analog scale score [GHVAS; scale, 0-100] and EuroQoL Five-Dimensions Three-Levels [EQ-5D-3L]); (3) discriminant validity (mean difference in STORK scores with GHVAS <70 vs =70); and (4) confirmatory telephone interviews with postpartum individuals scoring the highest and lowest 10th percentiles of STORK scores. Reliability (consistency of STORK scores) was evaluated using Cronbach a, interitem correlation, split-half reliability, and floor and ceiling effects. Responsiveness (ability of STORK to detect changes in recovery over time) was evaluated using percentage change in score from baseline to 12 weeks.A total of 525 individuals were recruited after all delivery modes (response rate, 62% [324 of 525] at 6 weeks); 498 (mean [SD] age, 33.3 [4.9] years) completed baseline inpatient postpartum surveys. STORK demonstrated validity: (1) a 4-factor model was the best fit (RMSR?=?0.05); (2) correlation with GHVAS scores was ??=?0.52 (95% CI, 0.43-0.61), and correlation with EQ-5D-3L scores was ??=?-0.67 (95% CI, -0.76 to -0.63); (3) STORK was able to discriminate between patients reporting good and poor recovery (good recovery: median STORK score, 151 [IQR, 136-163] vs poor recovery: median STORK score, 129 [IQR, 107-148]; P?
View details for DOI 10.1001/jamanetworkopen.2025.5713
View details for PubMedID 40244582