2.9. RNA-Seq Data Analysis

EB Edyta Biskup
BL Brian Daniel Larsen
LR Leonor Rib
LF Lasse Folkersen
ON Omid Niazi
MK Maria R. Kamstrup
CS Claus Storgaard Sørensen
request Request a Protocol
ask Ask a question
Favorite

Quality control of sequence reads was done using the tools “FastQC” v0.11.2 [21], “RSeQC” v2.6.4 [22] and “fastq_screen” v0.11.4) [23].

Adaptors, low-quality bases and the first 12 bases and reads shorter than 25 nt were removed with “Trimmomatic” v0.39 [24] using settings “LLUMINACLIP:”, TruSeq3-SE.fa”:2:30:10 and HEADCROP:12 LEADING:3 SLIDINGWINDOW:4:15 MINLEN:25”.

Reads were mapped using “STAR” v2.6c [25] against the human genome (hg19). Up to two mismatches were allowed during the mapping, and the minimum number of overlap bases to trigger mates merging and realignment was set to five. Otherwise, default settings were used. Duplicate reads were removed using the “MarkDuplicates” function from the “picard” v2.6.0 software [26].

The “featureCounts” function of the “Rsubread” R package v1.32.4 [27] was used to quantify reads in exons. The Refseq hg19 gene annotation was used to assign reads to genes.

The “edgeR” v3.24.3 software [28] was used to perform a differential expression analysis. For this purpose, first, a model was defined indicating the experimental conditions and the library preparation batch information. Library normalization factors were calculated using the “calcNormFactors” function with the “TMM” algorithm. Tag-wise dispersion was calculated using the “estimateDisp” function with “robust = TRUE”. A gene-wise generalized linear model was fit with “glmQLFit”. Finally, differential gene usage was assessed using “glmQLFTest”. Resulting p-values were corrected for multiple testing using the “Benjamini-Hochberg” method.

Gene set enrichment analyses of all annotated genes were done with the ranked loge fold changes using the “gseGO” function in the “clusterProfiler” R package [29]. Related functions were used to visualize the results together with the “DOSE” R package functions [30]. To summarize the large list of enriched gene ontology (GO) pathways, the “emaplot” was produced, and the collapsed GO pathways were classified manually. For the final dotplot, the GO pathway with the most significant p-value was kept.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A