Abstract

Optimizing lung cancer screening: Independent verification of an AI/ML computer-aided detection and characterization software as medical device

Full text
BackgroundDetection and risk stratification of pulmonary nodules during a Lung Cancer Screening (LCS) examination is time-consuming, prone to false positives and missed detections. Leveraging Artificial Intelligence (AI) in such tasks has outperformed human readers in detection, and current risk models (Brock, MAYO) in risk prediction. Here, we present the results of an independent verification study to explore the software’s detection and characterization performance on an external dataset.MethodsThe model was trained on 10,872 patients (543 cancers) independently annotated from the NLST (National Lung Screening Trial) and LIDC (Lung Image Database Consortium) cohorts. 264 patients meeting the USPSTF criteria (age 50-80, Smoker) were collected from the EU (26%) and the USA (74%) for independent verification. The dataset comprised 88 cancer patients and 176 benign, with average size for all nodules of 6.2 ± 3.2 mm, and 17.5 ± 5.8 mm for cancerous nodules. To generate reference standards, two radiologists independently annotated each patient, with a third acting as an adjudicator to arrive at a consensus for location and nodule diagnosis (histopathology or ≥12 month stability).ResultsThe verification AUC for risk prediction was 0.95, with a sensitivity of 93.2% and specificity of 87.5% at the Youden index. Cancer detection sensitivity was 91.2%, with an average of 0.44 false positive detections per scan. Performances were consistent across multiple technical and clinical parameters, including CT manufacturer, kernel hardness, kernel slice thickness, patient sex, data source, and nodule solidity (see the table). Table: 1192P Subclass type Subclass AUC Manufacturer SIEMENS Healthineers 0.96 GE Healthcare 0.94 Canon/Toshiba 0.95 Kernel Sharp 0.98 Average 0.94 Soft 0.93 Slice thickness (mm) 0.5-0.75 0.96 1-1.25 0.94 1.25-1.5 0.96 Sex Female 0.95 Male 0.95 Source EU 0.90 USA 0.97 Nodule solidity Solid 0.96 Part-solid 0.93 ConclusionsThis study demonstrates that the AI software is robust to external data with similar performances versus the NLST test set (AUC: 0.97) and among subclasses. More accurate screening driven by AI promises to be beneficial to LCS, reducing unneeded exams, costs, and radiologist time.Legal entity responsible for the studyMedian Technologies.FundingMedian Technologies.DisclosureAll authors have declared no conflicts of interest.