Sequence conservation and complementary substitutions

SK Svetlana Kalmykova
MK Marina Kalinina
SD Stepan Denisov
AM Alexey Mironov
DS Dmitry Skvortsov
RG Roderic Guigó
DP Dmitri Pervouchine
request Request a Protocol
ask Ask a question
Favorite

To assess the degree of evolutionary conservation of a CCR, we computed the difference between the average PhastCons conservation score99 of all its nucleotides and the average PhastCons conservation score of the same number of nucleotides in its flanking regions within the same phastConsElements interval.

To assess the number of complementary substitutions in PCCRs and their statistical significance, we used global MSAs of 99 vertebrate genomes with human genome139. For each PCCR, we extracted two parts of the MSA corresponding to two CCRs using Bio.AlignIO.MafIO module from biopython library140. The organisms that had indels compared to the reference organism (hg19) in any of the two CCRs were removed. The number of orthologous sequences for each PCCR ranged from 15 to 99. The two alignment blocks were merged through an additional spacer containing ten adenine nucleotides, resulting in an MSA STOCKHOLM format with a secondary RNA structure generated by PrePH. Next, we restrict the phylogenetic tree for the original MSA139 to have only the organisms available for the given PCCR and pass the tree and MSA to R-scape v1.2.340 with the following parameters: -E 1 -s –samplewc –nofigures. The output .out files of R-scape were parsed by custom scripts to extract E values of individual base pairs. The E value of the PCCR was defined to be equal to the product of E values of the base pairs that were marked as having significant covariations by R-scape. As a result of this procedure, E values were obtained for 909,146 PCCRs; 539,264 E values for PCCRs that were <1 were adjusted using Benjamini–Hochberg correction. MSA and the phylogenetic trees were downloaded from the UCSC Genome Browser website (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/multiz100way/). Structural alignments were visualized using tableGrob function from gridExtra R package.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A