An in-house RNA-seq pipeline was established for RNA-seq data analysis and processing. Reads were aligned to the mouse mm10 reference genome using STAR 2.5.2b [18] with the following parameter --outFilterMultimapNmax 10 --alignSJoverhangMin 10 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 3 --twopassMode Basic. Gencode annotation for mm10 (version vM11) was used as a reference to build STAR indexes and alignment annotation [19]. For each sample, a BAM file including mapped and unmapped with spanning splice junctions was produced. Secondary alignment and multi-mapped reads where further removed using in-house scripts. Only uniquely mapped reads were retained for further analysis.
Overall quality control metrics were performed using RseqQC using the UCSC mm10 gene model provided [20]. This includes the number of reads after multiple-step filtering, ribosomal RNA reads depletion, reads mapped to an exon, UTRs, and intronic regions.
Gene level expression was calculated using HTseq version 0.6.0 using intersection-strict mode by exonic regions [21]. Counts were calculated based on protein-coding genes annotation from mm10 Gencode gtf file (version vM11). CPM (counts per million reads) was calculated using edgeR [22]. To retain only expressed genes, we used a “by condition” log2(CPM + 1) cutoff. Briefly, a gene is considered expressed with log2(CPM + 1) ≥ 0.5 in all four replicates of a given condition (WT/VB6(+), WT/VB6(−), KO/VB6(+), KO/VB6(−)). In total, we detected 13428 protein-coding genes expressed in our data. We further used those genes for differential and co-expression analyses.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.