Lists of driver genes and mutations predicted by various algorithms (Table 2) applied to PanCanAtlas data were downloaded from https://gdc.cancer.gov/about-data/publications/pancan-driver (2020plus, CompositeDriver, DriverNet, HotMAPS, OncodriveFML), https://karchinlab.github.io/CHASMplus (CHASMplus), as well as received by personal communication from Francisco Martínez-Jiménez, Institute for Research in Biomedicine, Barcelona, gro.anolecrabbri@zenitram.ocsicnarf (dNdScv, OncodriveCLUSTL, OncodriveFML). All genes and mutations with q-value > 0.05 were removed. Additionally, a consensus driver gene list from 26 algorithms applied to PanCanAtlas data [8] was downloaded from https://www.cell.com/cell/fulltext/S0092-8674(18)30237-X and a COSMIC Cancer Gene Census (CGC) Tier 1 gene list [14] was downloaded from https://cancer.sanger.ac.uk/cosmic/census?tier=1. Only genes affected by somatic SNAs and CNAs present in the TCGA cancer types were used for further analyses from the CGC list. Cancer type names in the CGC list were manually converted to the closest possible TCGA cancer type abbreviation. Entrez Gene IDs were identified for each gene using HUGO Symbol and external database ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz.
The sensitivity of algorithms was assessed as the percentage of genes in a positive control list that were predicted as drivers by an algorithm, because Sensitivity = True positives/(True positives + False negatives). The specificity of algorithms was assessed as the percentage of all genes not in a positive control list that were not predicted as drivers by an algorithm, because Specificity = True negatives/(True negatives + False positives). Three positive control lists were used–CGC Tier 1 genes affected by somatic SNAs or CNAs in TCGA cancer types, a list of genes identified by at least two of all our sources (including CGC and Bailey), and a list of genes identified by at least three of all our sources (including CGC and Bailey). Sensitivity was assessed separately for algorithms applied to individual cancer types as the percentage of gene-cohort pairs in a positive control list that were matched by gene-cohort pairs predicted by an algorithm. Specificity was assessed separately for algorithms applied to individual cancer types as the percentage of all gene-cohort pairs not in a positive control list that were not matched by gene-cohort pairs predicted by an algorithm. Benchmarking data are available as S1 Files.
The GISTIC2 results for TCGA cohorts were obtained from http://firebrowse.org. GISTIC2 data are available as S4 Files.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.