RNA-sequencing

NK Naoko Kogata
PB Philip Bland
MT Mandy Tsang
EO Erik Oliemuller
AL Anne Lowe
BH Beatrice A. Howard
request Request a Protocol
ask Ask a question
Favorite

Complementary DNA (cDNA) library preparation was carried out at Oxford Genomics Centre, The Wellcome Trust Centre for Human Genetics using PolyA+ RNA enrichment method for total RNA from cultured cells and SMARTer method for total RNA from embryonic mammary tissue, respectively. Messenger RNA fraction was selected from the total RNA before conversion to cDNA. Second-strand cDNA synthesis incorporated dUTP. The cDNA was end-repaired, A-tailed and adapter-ligated. Prior to amplification, samples underwent uridine digestion. The prepared libraries were size selected, multiplexed and quality checked before paired-end sequencing over three lanes of a flow cell. Amplified cDNA from embryonic mammary tissues were generated by the SMARTer Amplification Kit. The cDNA was end-repaired, A-tailed, adapter-ligated and amplified. The prepared libraries were size selected, multiplexed and quality checked before paired end sequencing on four lanes of a flow cell. Data were aligned to the reference genome, mm10, and quality checked.

RNA-sequencing files were submitted to ArrayExpress as accession [E-MTAB-6846, MTAB-6856, E-MTAB-6859]. FastQ files were truncated to a consistent length of 75 bp using trim galore v0.4.3 and were then aligned against the mouse GRCm38 genome assembly using hisat2 v2.0.5 using options --no-mixed and --no-discordant. Mapped positions with mapping quality score values of <20 were discarded. Gene expression was quantitated using the RNA-sequencing quantitation pipeline in the SeqMonk software v1.37.0 (https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/) in opposing strand-specific library mode. For count-based statistics, raw read counts over exons in each gene were used. For visualisation and other statistics log 2RPM (reads per million reads of library) expression values were used. Differentially expressed genes were selected based on passing two statistical filters: the DESeq2 LRT with a cutoff of p < 0.05 following multiple testing correction and the SeqMonk Intensity Difference filter on log 2RPM values with a sample size of 1% of all genes and a cutoff of p < 0.05 after multiple testing correction. Hierarchical clustering was performed on per-gene median centred log 2RPM expression values using Pearson’s correlation. Gene cluster separation was performed by segmenting the tree at an R value of 0.5. Clusters containing <50 genes were discarded. PCA was performed on column-centred log 2RPM values without additional scaling.

The intensity difference test used a locally matched subset of 1% of genes based on average expression. From these a local standard deviation in log 2RPM difference values was calculated and used to calculate the probability of the cumulative distribution function for a normal distribution with this standard deviation, using the observed difference in the gene being tested. P values were corrected for multiple testing using Benjamini and Hochberg multiple testing correction.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A