Abstract

Explainable AI and machine learning algorithms to predict treatment failures for patients with cancer.

Author
person Muddassar Farooq CureMD Inc, New York, NY info_outline Muddassar Farooq, Muhammad Usamah Shahid
Full text
Authors person Muddassar Farooq CureMD Inc, New York, NY info_outline Muddassar Farooq, Muhammad Usamah Shahid Organizations CureMD Inc, New York, NY Abstract Disclosures Research Funding Institutional Funding CureMD Background: Cancer patients may undergo lengthy and painful chemotherapy treatments, comprising several successive regimens or plans. Treatment inefficacy and other adverse events can lead to discontinuation (or failure) of these plans, or prematurely changing them, which results in a significant amount of physical, financial, and emotional toxicity to the patients and their families. In this research work, we build AI driven treatment failure models that utilize the real-world evidence gathered from patients’ profiles available in an oncology EMR/EHR system, with a goal of predicting the likelihood of a plan being discontinued at the time of its prescription. The selected AI models achieve a prediction accuracy of more than 80% and also provide reasons for their inference. Methods: Inclusion and Exclusion Criteria: Deidentified and anonymized electronic health records of patients, with their prescribed chemotherapies, for five different primary cancer diagnoses - ICD10 codes C18, C34, C50, C61 and C90 - that have the highest plan discontinuation rates between the years 2015 and 2022 are analyzed. All patients of other cancer types are excluded. AI Models: Unique features, that influence the treatment failure, for each cancer type are engineered by using therapeutic classification of drugs, diagnoses codes, comorbidity scores, tumor and biomarker information that is extracted from the notes and lab tests. We only use features that are available at the time of selecting a treatment plan. Several machine learning classifiers are investigated, and three tree ensembles - random forests, Xgboost and boosted forests - are further evaluated on the validation set to fine tune learning parameters with an objective to reduce the complexity of decision trees for providing better interpretability without significantly compromising the accuracy. Results: Our pilot studies reveal that boosted forests comprising of 5 random forests, each with 5 trees of depth 10 offer the best compromise between performance and interpretability. The models once trained are evaluated on unseen datasets and four performance measures of AI models are reported. On average, 15 rules are autonomously generated for a treatment failure inference for each cancer type and generally 6 of them have a significant support of 30 samples or greater. Conclusions: Machine learning algorithms for predicting treatment efficacy of chemotherapy regimens by deriving inference from the patients’ EMR/EHR data is an emerging yet challenging research domain. Our studies demonstrate that AI models like boosted forests provide the optimal models for treatment failure use case. In future, we want to validate the system in controlled clinical trials with the help of oncologists. Code Cohort size Accuracy Specificity F1 score AUROC C18 1034 0.83 0.90 0.78 0.90 C34 1547 0.81 0.93 0.72 0.86 C50 2184 0.87 0.92 0.82 0.90 C61 1074 0.81 0.86 0.76 0.85 C90 866 0.85 0.89 0.83 0.92

1 organization

Organization
CureMD Inc