Abstract

EGFR evaluation in non-small cell lung cancer: An artificial intelligence approach to pre-molecular analysis

Full text
BackgroundNon-small cell lung cancer (NSCLC) requires multiple genomics testing modalities for optimizing patient outcomes. The foremost of NSCLC biomarkers is EGFR sequencing. Sequencing comes with many challenges, including long turnaround time, high tissue requirements from small biopsies, and cost. An AI model using only digital whole slide images (WSI) can act as a rapid screening test to prioritize tissue for proper sequencing without expending tissue.MethodsA vision transformer (ViT) base architecture is trained for classification of acinar, solid, lepidic, papillary, and micropapillary morphologies, using 1 million 2242 pixel patches extracted from 3475 WSIs. The training utilizes cross-entropy loss with the Adam optimizer with learning rate of 1e-4 and cosine weight decay scheduler. The pretrained encoder allows for extraction of 768-dimensional feature vectors from the last hidden layer for downstream tasks. For EGFR prediction, each of the 1558 training WSIs are decomposed to 2242 pixel patches and feature embeddings are extracted for each patch. Using a gated attention-based multiple instance learning model, EGFR WSI labels are predicted. The model was optimized using 260 WSIs to obtain best AUC. The best model was evaluated on a held-out set of 6300 WSIs before integration into a mock clinical workflow, enabling in real-time (IRT) EGFR prediction for 7 slides. The informatic backbone identifies WSI at time of scanning and transfers the slide for inference, complted within 30 minutes of scanning.ResultsOn the validation dataset of 260 cases, our model exhibited an area under the curve (AUC) of 0.93 with a specificity of 0.90 and sensitivity of 0.88. The model, assessed on an independent validation set of 6300 cases, maintained a high AUC of 0.89 with negative/positive predictive value (NPV/PPV): NPV = 0.90; PPV = 0.71. On IRT cohort, using same threshold: NPV = 1.0; PPV = 0.66.ConclusionsImplementing such a model that can be ran IRT with clinical WSIs can provide rapid insight and inform ongoing testing protocols (e.g. prioritize tissue for EGFR confirmation when positive or full genomics when negative). Continuous refinement and integration of IRT data will enhance performance to align with clinical process requirements.Editorial acknowledgementDuring the preparation of this work the author(s) used ChatGPT in order to construct the abstract title. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.Legal entity responsible for the studyThe Warren Alpert Center for Digital and Computational Pathology, Memorial Sloan Kettering Cancer Center.FundingThe Warren Alpert Foundation, The Warren Alpert Center for Digital and Computational Pathology, Memorial Sloan Kettering Cancer Center.DisclosureC.M. Vanderbilt: Financial Interests, Personal, Stocks or ownership: Paige AI. T. Fuchs: Financial Interests, Personal, Advisory Board, Founder, Equity holder, etc: Paige AI. M. Hameed: Financial Interests, Personal, Other, Fiduciary Role/Position: USCAP. A. Dogan: Financial Interests, Personal, Other, Professional Services and Activities: Incyte. All other authors have declared no conflicts of interest.