Raw sequencing reads were demultiplexed by OGT. FastQC (version 0.11.3; ref. 91) was used to assess sequencing read quality. Reads were mapped to the mouse genome (GRCm38.p5, GENCODE release M14; refs. 92, 93) using STAR (version 2.5.3 with option–outFilterIntronMotifs RemoveNonCanonical; ref. 94) with default parameters, and only uniquely mapping reads were selected for further analysis. Ribosomal RNA and mitochondrial reads were removed using modified scripts from the PORT pipeline (https://github.com/itmat/Normalization). SAM files were converted to BAM files using Samtools view and BAM files were sorted by coordinate with Samtools sort. Annotations were based on the comprehensive gene annotation file of the GENCODE Release M14 (mm10). The downloaded Gene Transfer Format (GTF) file was loaded into R (version 3.4.0; R Foundation for Statistical Computing) and converted into a TranscriptDb object with the makeTranscriptDbFromGFF tool in the Bioconductor package GenomicFeatures (version 1.28.3; ref. 95). From the TranscriptDb object, all annotated Ensembl genes and their exons were obtained using exonsBy (by = “gene”). Ensembl gene IDs were replaced by official gene symbols using biomaRt (version 2.32.1). Genes with several Ensembl gene IDs were combined into 1 record. For each duplicated gene, overlapping exons were combined into single exons. Thus, genes were defined as the sequence between the first base of the first exon and the last base of the last exon. Furthermore, regions shared by overlapping genes were removed because we worked with a nonstranded RNA-Seq library and wanted to count only reads mapping to 1 gene. The mapped, filtered RNA-Seq reads were counted using a custom R script, including the R packages Rsamtools (1.22.0), GenomicFeatures (1.22.8), and GenomicAlignments (1.6.3; ref. 95). Briefly, sorted BAM files were loaded into R using readGAlignmentPairs, and the number of reads mapping to genes was computed using findOverlaps (with options type = “within” and ignore.strand = TRUE) and countSubjectHits. Genes were analyzed for differential expression using the R package DESeq2 (1.16.1; ref. 96) and identified as differentially expressed if the FDR-adjusted P value was smaller than 0.05. PCA plots and heatmaps were plotted using R. Gene abundances were measured by DESeq2’s rlog transformed median of ratios. Counts were divided by sample-specific size factors determined by median ratio of gene counts relative to geometric mean per gene, which normalizes for sequencing depth and RNA composition. PCA was then performed based on the top 500 most variable genes across the 9 samples to determine their spatial distances between each other (i.e., verify the biological groups based on RNA-Seq data). Canonical pathway enrichment analysis was further conducted based on the identified differentially expressed genes using the IPA tool (Qiagen), thanks to its most comprehensive curated annotation resources.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.