Known RNA-edited sites were curated using publicly available databases: (1) Rigorously Annotated Database of A-to-I RNA editing (RADAR) (Ramaswami and Li 2014); and (2) a DAtabase of RNA EDiting (DARNED) (Kiran and Baranov 2010). DARNED also included the results of a recent study where RNA-editing sites were identified from whole brains of 15 inbred lab mouse strains (Danecek et al. 2012). These nucleotide coordinates were then used to extract reads from the samples using the “knownsites.py” program from REDItools (Picardi and Pesole 2013). We filtered coordinates that were covered by less than 10 reads in any 10 samples (irrespective of being cases or controls). In addition, the nonreference states or edited nucleotide should be covered with at least five reads.
Computationally, RNA editing is identified as a single nucleotide base change between DNA and RNA. We have identified RNA-edited sites using REDItools (Picardi and Pesole 2013), with the default setting for the majority of the parameters, except for: minimum base quality of 25, minimum mapping quality of 20 (probability that a read is aligned to multiple locations), probability of misalignment = 0.01 (i.e., 99% probability that a read is correctly aligned in the genome), and minimum read coverage per edited site to be 10. As described previously in Danecek et al. (2012), prediction of RE is more prone to these biases, hence, changing these parameters would reduce the number of falsely predicted RE events. In addition, to these parameters we have also used the Benjamini-Hochberg (BH)-corrected Fisher's exact test (FET) P-value < 0.05 for the overrepresentation of alternate allele in each site meeting the above-mentioned criteria. Initially, a site was considered to be edited if at least one sample was observed to have a significant enrichment of alternate allele (FET corrected P-value < 0.05 for each site). Identified edited sites were also filtered for all known mouse single nucleotide variations available from Ensembl mouse SNP database version 137. Additional biases such as strand and variant distance bias were removed as described in Danecek et al. (2012). VDB evaluates the likelihood of the mean pairwise distance of the variant bases in the aligned portion of the reads; it was calculated using SAMtools/BCFtools and the filter was set to 0.015 (Danecek et al. 2012). Strand bias was calculated by estimating overrepresentation of alternate alleles between the positive and negative strand, and a P-value > 0.05 was used to filter the sites/samples. Further, we removed read sequences with very high similarity (>95%) with other genomic regions and filtered out sites that resided within mouse regions harboring genome duplication events and the sites within 4 bp of an exon-intron boundary (the latter because RNA-seq mapping near exon boundaries tends to be unreliable [Danecek et al. 2012]).
Finally, we filtered all the sites that were supported by less than 10 samples with at least five reads supporting an alternate nucleotide (5% of the total samples), and the identified sites should not be in Hardy-Weinberg equilibrium (P-value > 0.05). In addition, for the clustered RE data set, we filtered RE sites that were not within 50 bp of another RE site at the transcript level.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.