2.1. Basic conserved sequence detection method (BCSDm)

MC Maria Araceli Diaz Cruz
DL Dan Lund
FS Ferenc Szekeres
SK Sandra Karlsson
MF Maria Faresjö
DL Dennis Larsson
request Request a Protocol
ask Ask a question
Favorite

The Basic Conserved Sequence Detection method (BCSDm) extracts the most conserved sequences (without insertions or deletions) and their location within the specific sequence region.

The BCSDm implemented in Python 3.6 using Biopython [29] is based on the combination of three methods: (1) alignment of the different sequences, (2) extraction of the alignment profile and its position score matrix (PSSM), and (3) obtainment of the conserved nucleotide patterns and their position in the alignment (Figure S1).

The alignment was performed for each gene sequence between the five mammalian species using the multiple alignment program MAFFT [30]. From the resulting alignment file, an alignment profile summarizing the alignment for the five species was obtained. The dumb consensus method was selected for extracting the alignment profile and for calculating the number of each nucleotide type at each position of the alignment for all the sequences [29]. If the percentage of the most common nucleotide type was greater than the default threshold (0.7), the nucleotide was added to the alignment profile. This method was used to avoid gaps in the extracted sequence pattern. After obtaining the alignment profile, a PSSM was calculated to represent the probabilities of the occurrence of each nucleotide in the consensus sequence. The conserved patterns were extracted from the PSSM by the following conditions. First, each position selected from the score matrix should be 100% conserved, to avoid gaps, insertions or substitutions. Second, to include TFBS and to exclude random appearance in the selection of conserved patterns, a threshold of ≥15 consecutive nucleotides was applied.

All the 49 genes from the NRS were analyzed with BCSDm. Of these, 25 genes had at least one conserved intronic pattern and were thus further analyzed. A total of 1,044 conserved intron patterns in these 25 genes were extracted using BCSDm. Conserved sequences were grouped according to their ordinal position in the transcript (called intron 1 to intron 11). The percentage of conserved sequences was calculated for each intron group. To avoid an unequal number of introns between NR genes, the conserved patterns obtained were normalized to the number of genes containing each intron. Moreover, the number of conserved patterns in each intron was normalized to their sequence length.

Exon sequences, from the same 25 NRS genes described earlier, were also analyzed with BCSDm. A total of 552 conserved exon patterns in these 25 genes were extracted and further analyzed.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A