Quality control, normalization, and filtering of gene expression data.

TW Thet Su Win
WC William J. Crisler
BD Beatrice Dyring-Andersen
RL Rachel Lopdrup
JT Jessica E. Teague
QZ Qian Zhan
VB Victor Barrera
SS Shannan Ho Sui
ST Sotirios Tasigiorgos
NM Naoka Murakami
AC Anil Chandraker
ST Stefan G. Tullius
BP Bohdan Pomahac
LR Leonardo V. Riella
RC Rachael A. Clark
request Request a Protocol
ask Ask a question
Favorite

Quality control metrics and plots were obtained using the NanoStringQCPro package (ref. 46; R package, version 1.10.0). A noise threshold was defined as the median value of expression for the negative controls plus 2 standard deviations. Samples were removed if the median expression of all housekeeping genes in that sample was less than the noise threshold. Three samples did not pass the noise threshold and were therefore excluded, resulting in 35 samples being included in the subsequent analyses (Banff rejection grade 0, n = 10; grade 1, n = 6; grade 2, n = 8; grade 3, n = 11). Raw mRNA counts were normalized with the geometric mean in 2 steps, first using the positive control genes, and second using a subset of housekeeping genes. Housekeeping genes were selected if they had an expression value above the noise threshold and a mean value higher than 200 (a value selected empirically after examination of the average expression of housekeeping genes). After normalization, genes with values lower than the noise threshold in all samples were excluded; 769 genes passed this filter and were included in the analysis. Normalized data were log2-transformed. All the analyses, heatmaps, and PCAs were generated using the R statistical language (version 3.5.1).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A