2.9. RNA-Seq Data Analysis

Edyta Biskup; Brian Daniel Larsen; Leonor Rib; Lasse Folkersen; Omid Niazi; Maria R. Kamstrup; Claus Storgaard Sørensen

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.9. RNA-Seq Data Analysis

EB Edyta Biskup

BL Brian Daniel Larsen

LR Leonor Rib

LF Lasse Folkersen

ON Omid Niazi

MK Maria R. Kamstrup

CS Claus Storgaard Sørensen

This method is extracted from research article: Cells, Nov 2020

Photochemotherapy Induces Interferon Type III Expression via STING Pathway

DOI: 10.3390/cells9112452

Request a Protocol

Ask a question

Favorite

Quality control of sequence reads was done using the tools “FastQC” v0.11.2 [21], “RSeQC” v2.6.4 [22] and “fastq_screen” v0.11.4) [23].

Adaptors, low-quality bases and the first 12 bases and reads shorter than 25 nt were removed with “Trimmomatic” v0.39 [24] using settings “LLUMINACLIP:”, TruSeq3-SE.fa”:2:30:10 and HEADCROP:12 LEADING:3 SLIDINGWINDOW:4:15 MINLEN:25”.

Reads were mapped using “STAR” v2.6c [25] against the human genome (hg19). Up to two mismatches were allowed during the mapping, and the minimum number of overlap bases to trigger mates merging and realignment was set to five. Otherwise, default settings were used. Duplicate reads were removed using the “MarkDuplicates” function from the “picard” v2.6.0 software [26].

The “featureCounts” function of the “Rsubread” R package v1.32.4 [27] was used to quantify reads in exons. The Refseq hg19 gene annotation was used to assign reads to genes.

The “edgeR” v3.24.3 software [28] was used to perform a differential expression analysis. For this purpose, first, a model was defined indicating the experimental conditions and the library preparation batch information. Library normalization factors were calculated using the “calcNormFactors” function with the “TMM” algorithm. Tag-wise dispersion was calculated using the “estimateDisp” function with “robust = TRUE”. A gene-wise generalized linear model was fit with “glmQLFit”. Finally, differential gene usage was assessed using “glmQLFTest”. Resulting p-values were corrected for multiple testing using the “Benjamini-Hochberg” method.

Gene set enrichment analyses of all annotated genes were done with the ranked log_e fold changes using the “gseGO” function in the “clusterProfiler” R package [29]. Related functions were used to visualize the results together with the “DOSE” R package functions [30]. To summarize the large list of enriched gene ontology (GO) pathways, the “emaplot” was produced, and the collapsed GO pathways were classified manually. For the final dotplot, the GO pathway with the most significant p-value was kept.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol