RNA‐seq data analysis to find DEGs

MG Myret Ghabriel
AH Ahmed El Hosseiny
AM Ahmed Moustafa
AA Asma Amleh
request Request a Protocol
ask Ask a question
Favorite

We used Kallisto version 0.46.1 for pseudo-alignment and the quantification of abundances of transcripts from the RNA-Seq data (Bray et al., 2016). Human data were pseudo-aligned to the human reference transcriptome GCA_000001405.15_GRCh38, while mouse data were pseudo-aligned to the mouse reference transcriptome GCA_000001635.9_GRCm39 provided by the Genome Reference Consortium (O'Leary et al., 2016). Pseudomapping was performed using Kallisto (Bray et al., 2016) through the identification of transcripts that a read is compatible with and assigning it a target ID. Each target ID has a corresponding accession number in the index file. Then the abundances of the transcripts are quantified and output files of abundances containing the transcript per million (TPMs) of each target ID and their corresponding accession numbers are produced. After quantification, Sleuth version 0.29.0 was used for the differential expression analysis of the transcript quantifications between mesenchymal stem cells and their tissue specific counterparts.

Sleuth loaded the Kallisto processed data, estimated the parameters for its response error measurement “full” model followed by the estimation of the parameters for its reduced model, and performed differential analysis using the likelihood ratio test. Sleuth normalizes the data by its ability to distinguish between technical and biological variance and performs shrinkage to the model only on the biological component of variance. Sleuth accounts for technical variability in the abundance estimates and models the true abundance using a general linear model, while including the technical variance as error in the response variable. Thereby, distinguishing between technical or biological sources of variance when determining differentially expressed transcripts.

Accordingly, Sleuth produced a table of significant differentially expressed genes (DEGs) with a q value less than 0.05.This step generated lists of significant DEGs for each tissue type and species. The lists included the gene symbols of significant DEGs and their corresponding TPMs (Pimentel et al., 2017). The gene symbols of each list in the human data were compared and the common DEGS retrieved with their equivalent expression. This step was repeated for the mouse data and the common DEGs retrieved. Subsequently, the list of common DEGs identified between the human MSCs samples were cross referenced and compared to the gene symbols of the common DEGs between the mouse MSCs samples to produce a list of common DEGs between the two species with their equivalent expression. Gene symbols that weren’t common between the two species were checked for homology using Homologene (http://www.ncbi.nlm.nih.gov/homologene/) and the analysis repeated. Venn diagrams of the common DEGs were constructed using Venny version 2.0 (Oliveros, 2007). Finally, the expression of the common DEGs was compared and visualized using R Studio 3.6.1 that generated heatmaps and t-distributed stochastic neighbor-embedding clustering (www.r-project.org) (Team RStudio, 2019).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A