Random survival forests using linked data to measure illness burden among individuals before or after a cancer diagnosis: Development and internal validation of the SEER-CAHPS illness burden index

Lisa M. Lines, Julia Cohen, Justin Kirschner, Michael T. Halpern, Erin E. Kent, Michelle A. Mollica, Ashley Wilder Smith

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Purpose: To develop and internally validate an illness burden index among Medicare beneficiaries before or after a cancer diagnosis. Methods: Data source: SEER-CAHPS, linking Surveillance, Epidemiology, and End Results (SEER) cancer registry, Medicare enrollment and claims, and Medicare Consumer Assessment of Healthcare Providers and Systems (Medicare CAHPS) survey data providing self-reported sociodemographic, health, and functional status information. To generate a score for everyone in the dataset, we tabulated 4 groups within each annual subsample (2007–2013): 1) Medicare Advantage (MA) beneficiaries or 2) Medicare fee-for-service (FFS) beneficiaries, surveyed before cancer diagnosis; 3) MA beneficiaries or 4) Medicare FFS beneficiaries surveyed after diagnosis. Random survival forests (RSFs) predicted 12-month all-cause mortality and drew predictor variables (mean per subsample = 44) from 8 domains: sociodemographic, cancer-specific, health status, chronic conditions, healthcare utilization, activity limitations, proxy, and location-based factors. Roughly two-thirds of the sample was held out for algorithm training. Error rates based on the validation (“out-of-bag,” OOB) samples reflected the correctly classified percentage. Illness burden scores represented predicted cumulative mortality hazard. Results: The sample included 116,735 Medicare beneficiaries with cancer, of whom 73 % were surveyed after their cancer diagnosis; overall mean mortality rate in the 12 months after survey response was 6%. SEER-CAHPS Illness Burden Index (SCIBI) scores were positively skewed (median range: 0.29 [MA, pre-diagnosis] to 2.85 [FFS, post-diagnosis]; mean range: 2.08 [MA, pre-diagnosis] to 4.88 [MA, post-diagnosis]). The highest decile of the distribution had a 51 % mortality rate (range: 29–71 %); the bottom decile had a 1% mortality rate (range: 0–2 %). The error rate was 20 % overall (range: 9% [among FFS enrollees surveyed after diagnosis] to 36 % [MA enrollees surveyed before diagnosis]). Conclusions: This new morbidity measure for Medicare beneficiaries with cancer may be useful to future SEER-CAHPS users who wish to adjust for comorbidity.

Original languageEnglish (US)
Article number104305
JournalInternational Journal of Medical Informatics
Volume145
DOIs
StatePublished - Jan 2021

Keywords

  • Cancer registry data
  • Claims data
  • Morbidity
  • Mortality
  • Random survival forests
  • Survey data

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'Random survival forests using linked data to measure illness burden among individuals before or after a cancer diagnosis: Development and internal validation of the SEER-CAHPS illness burden index'. Together they form a unique fingerprint.

Cite this