We obtained six NSCLC cohorts with pathological staging information from the public domain of the cancer imaging archive (TCIA). This study was a retrospective analysis of anonymized data and institutional review board (IRB) approval was obtained at Sungkyunkwan University. All data were obtained with informed written consent. The cohorts were NSCLC-Radio-genomics [16,17,18,19], NSCLC-Radiomics-Genomics [19,20,21], CPTAC-LUAD [19,22], CPTAC-LSCC [19,23], TCGA-LUAD [19,24], and TCGA-LUSC [19,25] cohorts. The CPTAC-LUAD and TCGA-LUAD contained lung adenocarcinoma and the CPTAC-LSCC and TCGA-LUSC contained lung squamous cell carcinoma. The first two cohorts were combined and used as training and validation sets. The CPTAC-LUAD and CPTAC-LSCC cohorts were combined and used as the first test set. The TCGA-LUAD and TCGA-LUSC cohorts were combined and used as the second test set. Some patients had both contrast-enhanced and non-contrast CT, while some patients had only one. We included patients with non-contrast CT that led a total of 65 in NSCLC-Radio-genomics, 33 in NSCLC-Radiomics-Genomics, 17 in CPTAC-LUAD, 20 in CPTAC-LSCC, 13 in TCGA-LUAD, and 13 in TCGA-LUSC cases. The cases were further grouped into training (n = 90), validation (n = 8), CPTAC-test cohorts (n = 37) and TCGA-test cohort (n = 26). Details regarding patient information are given in Table 1. The six cohorts had non-contrast CT imaging performed with various scanners, obtained with the following parameters: detector collimation 0.3184 to 1.3672 mm; reconstruction interval 0.5 to 5 mm. The most typical CT imaging setting was the 0.625 mm detector collimation and 1.34 mm reconstruction interval. Some cohorts, NSCLC-Radio-genomics and NSCLC-Radiomics-Genomics, did not provide the overall pathological stage, but provided TNM staging information. TNM represents the size and extent of the main tumor, the spread to nearby lymph nodes, and the metastasis to distant sites, respectively. Thus, we computed the overall stage using the available TNM stage information provided by the open database according to the American Joint Committee on Cancer staging manual (7th version) [3]. The stages were binarized to early-stage (stage Ⅰ) and advanced-stage (stages Ⅱ-Ⅳ).
Patient information for various cohorts.
The NSCLC-Radio-genomics and NSCLC-Radiomics-Genomics cohorts were combined into one set and we randomly split them into training cohort (n = 90) and validation cohort (n = 8) keeping the relative frequency of early- and advanced-stage (i.e., 0.63 and 0.37) similar between cohorts. The validation cohort was used to tune the hyperparameters of the two networks. We combined CPTAC-LUAD and CPTAC-LSCC cohorts to form the first test cohort (CPTAC-test, n = 37) and combined TCGA-LUAD and TCGA-LUSC to form the second test cohort (TCGA-test, n = 26). The datasets were assigned based on the data collection institutions.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.