RNA-Seq and statistical analysis

KF Keith S. K. Fong
RH Robert B. Hufnagel
VK Vedbar S. Khadka
MC Michael J. Corley
AM Alika K. Maunakea
BF Ben Fogelgren
ZA Zubair M. Ahmed
SL Scott Lozanoff
request Request a Protocol
ask Ask a question
Favorite

RNA was purified from the anterior part of E9 heads (18-22 somites) using the RNeasy Plus Universal mini kit (QIAGEN Inc., Valencia, CA). Quality and concentration was determined using a 2100 Bioanalyzer (Agilent Technologies) and NanoDrop (NanoDrop, Wilmington, DE). RNA with a RIN of at least 9 was pooled for a total of 300 ng from three biological replicates for each condition (wild type and mutant). RNA was poly-A selected using Dynabeads (Thermo Fisher Scientific, Waltham, MA). cDNA libraries were constructed using the Ion Total RNA-Seq Kit v2 (Thermo Fisher Scientific) according to the manufacturer's protocol for poly-A-selected RNA. RNA-Seq libraries were templated using an Ion OneTouch 2 System (ThermoFisher Scientific) and sequenced using a 200 bp kit on an Ion Proton sequencing system (Life Technologies) according to the manufacturer's instructions.

Ion Torrent Suite was used to obtain FASTQ sequencing data. Sequenced single-end reads (66,187,511 for wild type and 53,803,778 for tuft) were trimmed and filtered using PRINSEQ (Schmieder and Edwards, 2011). Low-quality sequences were trimmed from the ends until a base pair of Phred quality score ≥20 (at least 99% accurate) was not found, and filtered out sequences having below 20 nucleotides.

The Mus musculus UCSC mm10 reference genome was indexed by Bowtie2 v2.2.5. Processed reads were aligned to the reference genome using Tophat v2.0.14 (Kim et al., 2013). Tophat2 incorporates the Bowtie2 (Langmead and Salzberg, 2012) algorithm to perform the alignment. Resulting alignment (.BAM) files were analyzed with Cufflinks v2.1.1 (Trapnell et al., 2010), which quantified transcript abundance in terms of reads per kilobase of exon model per million mapped reads (RPKM). SAMtools v0.1.18 (Li et al., 2009) was used for sorting and BAM conversion, and htseq-count script on HTSeq package was used to count reads mapped to mouse gene models.

Differential gene expression from the count data was identified using the non-parametric NOISeq-sim program (Tarazona et al., 2011) with default parameters, a trimmed mean of M-values normalization and estimated probability of differential expression PNOI >0.95 as a threshold. The probability (1-PNOI) reported in NOISeq can be considered equivalent to q-value [false discovery rate (FDR)-adjusted P-value] (Zheng and Moriyama, 2013). Gene set enrichment analysis on the expressed genes was conducted using GSEA (http://www.broadinstitute.org/gsea/) with recommended default parameters of 1000 permutations and FDR<0.25 as a threshold for enrichment in phenotype. Data was deposited into the Gene Expression Omnibus (GEO), accession number GSE75001.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A