RNA-Seq Data Processing and Analysis

KK Karen J. Kloth
IA Ilka N. Abreu
ND Nicolas Delhomme
IP Ivan Petřík
CV Cloé Villard
CS Cecilia Ström
FA Fariba Amini
ON Ondřej Novák
TM Thomas Moritz
BA Benedicte R. Albrectsen
request Request a Protocol
ask Ask a question
Favorite

Data preprocessing was performed according to the Epigenesys guidelines(http://www.epigenesys.eu/en/protocols/bio-informatics/1283-guidelines-for-rna-seq-data-analysis). In brief, raw sequence quality was assessed with FastQC v0.11.4 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and summarized with MultiQC v1.3 (http://multiqc.info/). Residual rRNA contamination was determined and removed using SortMeRNA v2.1, settings –log–paired_in–fastx–sam–num_alignments 1 (Kopylova et al., 2012) using the rRNA sequences provided with SortMeRNA (rfam-5s-database-id98.fasta, rfam-5.8s-database-id98.fasta, silva-arc-16s-database-id95.fasta, silva-bac-16s-database-id85.fasta, silva-euk-18s-database-id95.fasta, silva-arc-23s-database-id98.fasta, silva-bac-23s-database-id98.fasta and silva-euk-28s-database-id98.fasta). Adapters were removed and data were trimmed for quality with Trimmomatic v0.36, settings TruSeq3-PE-2.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:50 (Bolger et al., 2014). A second FastQC quality control was performed to ensure that no technical artifacts had been introduced. Read counts were obtained with kallisto v0.43.0 settings quant -b 100–pseudobam -t 1–rf-stranded (Bray et al., 2016) using the ARAPORT11 cDNA sequences as a reference (Cheng et al., 2017). Abundance values where imported into R v3.4.3 (R-Core-Team, 2017) using Bioconductor v3.6 (Gentleman et al., 2004) tximport package (Soneson et al., 2015). Read counts were normalized using a variance stabilizing transformation as implemented in DESeq2 v1.14.1 (Love et al., 2014) using batch + treatment * plant line as the design. Custom-made R scripts were written for downstream analyses. Plant lines were compared within treatment, and only DEGs with an adjusted P value < 0.01 and a log2 fold change ≥ 0.5 were considered (Schurch et al., 2016). DEGs in the control treatment were clustered based on the z-scores of normalized expression in the pae9-2 mutant with Ward’s method and Pearson correlation tests. Heatmaps were constructed with the R package gplots v3.0.1 (https://www.rdocumentation.org/packages/gplots/versions/3.0.1). Pertubations per gene cluster were visualized with the lsd R package (Schwalb et al., 2018). DEGs were tested for overrepresentation of biological processes against a reference set including all transcripts in the complete data set with at least one count, using the application BiNGO in Cytoscape (Maere et al., 2005; Cline et al., 2007). Redundant gene ontology (GO) terms were removed and GO networks were inferred with REViGO (Supek et al., 2011). The custom R scripts used for the analyses are available from our GitHub repository: https://github.com/UPSCb/UPSCb/blob/master/manuscripts/Kloth2018. The RNA-Seq data are available from the European Nucleotide Archive (ENA, https://ebi.ac.uk/ena) under the accession number PRJEB27944.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A