In this study, we used 4 RNA gene expression datasets to evaluate classification performance with CLL target genes; these datasets were GSE2466, GSE19147, GSE50006, and GSE8835. The datasets were selected by using the Illumina BaseSpace Correlation Engine (http://www.illumina.com) and are publicly available at the NCBI Gene Expression Omnibus (www.ncbi.nlm.nih.gov/geo/). The data selection criteria were as follows: 1) Sample organism was Homo sapiens; 2) Data type was RNA expression; 3) Experiment design was CLL case vs. normal control. From each dataset, expression data of normal controls and CLL patients were extracted and used for case/control classification. Genes of each dataset were limited to CLL target genes curated within the database CLL_042017. The key statistics of the four datasets are summarized in Table 1.
Statistics of four gene expression datasets.
The gene expression profiles of the four gene expression datasets are also included in CLL_042017: CLL_042017→GSE2466, GSE19147, GSE50006, and GSE8835. Within each dataset, the SRVS-generated weights (SRVSScore) and analysis of variance (ANOVA)-generated p-value score (PValueScore; logic transferred p-values: -10*log(p-value)) are also presented. The p-value for a gene is generated from the one-way ANOVA of the case/control comparison with the corresponding expression data. An SRVSScore and a PValueScore represent the significance of a gene in the dataset, according to SRVS and ANOVA methods, respectively.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.