Abstract

A predictive model for survival in non-small cell lung cancer (NSCLC) based on electronic health record (EHR) and tumor sequencing data at the Department of Veterans Affairs (VA).

Author
person Nathanael Fillmore VA Boston Healthcare System and Dana-Farber Cancer Institute, Boston, MA info_outline Nathanael Fillmore, Jamie Ramos-Cejudo, David Cheng, David P. Tuck, Ayesha Rizwan Sheikh, Daniel Chen, Danne Elbers, Feng-Chi Sung, Brett Johnson, Colleen Shannon, Karen Pierce-Murray, Kelly Gaynor, Corri Dedomenico, Sarah Schiller, Samuel Ajjarapu, Robert Hall, Siamack Ayandeh, Frank Meng, Mary T. Brophy, Nhan Do
Full text
Authors person Nathanael Fillmore VA Boston Healthcare System and Dana-Farber Cancer Institute, Boston, MA info_outline Nathanael Fillmore, Jamie Ramos-Cejudo, David Cheng, David P. Tuck, Ayesha Rizwan Sheikh, Daniel Chen, Danne Elbers, Feng-Chi Sung, Brett Johnson, Colleen Shannon, Karen Pierce-Murray, Kelly Gaynor, Corri Dedomenico, Sarah Schiller, Samuel Ajjarapu, Robert Hall, Siamack Ayandeh, Frank Meng, Mary T. Brophy, Nhan Do Organizations VA Boston Healthcare System and Dana-Farber Cancer Institute, Boston, MA, VA Boston Healthcare System, Boston, MA, VA Boston Healthcare System and Boston University Medical Center, Boston, MA, Boston Medical Center/ Boston University School of Medicine, Boston, MA, Boston University School of Medicine, Boston, MA, VA Boston Healthcare System and Univ. Buffalo Sch. of Med. and Biomed. Sciences, Boston, MA, VA Boston Health Care System, Boston, MA, VA Boston Healthcare System and Boston University School of Medicine, Boston, MA Abstract Disclosures Research Funding Other Background: Machine learning tools based on EHR data hold promise to help avoid unnecessary risks associated with lung cancer and its treatment. Additionally, molecular genetic profiling is becoming an integral tool for clinicians to individualize treatment for lung cancer. However, relatively few survival models have been built that integrate this data in individualized predictive models. Here, we combine real-world EHR and tumor sequencing data from the VA Precision Oncology Data Repository (PODR) to build accurate individualized survival predictions in newly-diagnosed NSCLC patients. Methods: We identified a cohort of 356 VA patients newly diagnosed with NSCLC for whom EHR, cancer registry, and targeted tumor sequencing data is available in PODR. We defined 41 features reflecting 15 baseline clinical and demographic characteristics from the EHR and registry, such as age, race, stage, histology, and therapy. We also defined features reflecting 206 clinically actionable somatic variants. We selected 5 important variants for inclusion in the model, as well as the total number of mutations. We trained a random forests algorithm to predict 1-year survival. Precision, recall, and area under the ROC curve (AUC) were assessed using 5-fold cross validation. Results: Mean age at diagnosis was 66 years. The majority of patients had late stage disease (15% stage I, 6% II, 15% III, 44% IV), and 59% of patients received systemic therapy. 45% died within 1 year of diagnosis, and 55% survived past 1 year. Our predictive model for 1-year survival achieves strong results. Cross-validated AUC is 0.79 (SD 0.08), precision is 0.79 (SD 0.07), recall is 0.74 (SD 0.07), suggesting that the trained model combining clinical and genomic features is effective at predicting 1-year survival. Conclusions: By integrating real-world EHR and sequencing data, we built a highly accurate predictive model of 1-year survival in NSCLC patients at the VA. Such a model, after ongoing validation in a larger cohort, offers the ability to make individualized predictions that could inform patient care to improve outcomes.