Selection of up-regulated genes for each sample

TZ Tianyu Zhang
LZ Liwei Zhang
FL Fuhai Li
request Request a Protocol
ask Ask a question
Favorite

In this study, the GTEx normal ovarian tissue samples were used as normal control versus ovarian cancer tumor samples from TCGA. The simple fold change and p-value <= 0.05 (using t test) will result in too many up-regulated genes. The Maximum Likelihood Estimate (MLE) method (see Fig. 1, red probability distribution function (PDF) curve) also generated too many up-regulated genes. Thus, we employ the Markov chain Monte Carlo (MCMC) model to simulate the distribution of gene expression distribution of given genes based on the normal tissues. Let x, D present the gene expression of a given gene and normal tissues respectively.

Gene expression distribution of gene “CENPH”

We use the conjugate priors for μ andσ2 , which are the Normal distribution and Inverse Gamma distribution: μ : N(w0, v0), σ2 : IG(a0, b0).. To get uninformative priors, we set w0 = 0, v0 = +∞, a0 = 0, b0 = 0. Since it is hard to calculate eq. (1), we use MCMC method to simulate the distribution. The python package “Pymc3” [20] was employed to conduct the analysis. We set w0 = 0, v0 = 104, a0 = 10− 3, b0 = 10− 3. The MCMC model is better than MLE (see the green PDF curve in Fig. Fig.1),1), but still too many up-regulated genes will be selected. To further reduce the number of up-regulated genes, we empirically simulate the PDF of random variable y = 2x, and use the PDF of y to calculate the p-value of given gene expression in ovarian cancer samples. Specifically, we selected up-regulated genes for each tumor sample with fold change> = 2 and p-value<=0.05 (calculated based on the PDF of random variable y). We take the gene “CENPH” as an example to illustrate this analysis. The PDF generated by the MCMC model is more robust than generated by Maximum Likelihood Estimate (MLE) (see Fig. Fig.1).1). The yellow point is the threshold and area under blue curve on the right of yellow point is about 0.05 (the calculation of p-value).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A