2.7. Comparison of Various MSA Methods

Eugene V. Korotkov; Yulia M. Suvorova; Dmitrii O. Kostenko; Maria A. Korotkova

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.7. Comparison of Various MSA Methods

EK Eugene V. Korotkov

YS Yulia M. Suvorova

DK Dmitrii O. Kostenko

MK Maria A. Korotkova

This method is extracted from research article: Genes (Basel), Jan 2021

Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome

DOI: 10.3390/genes12020135

Request a Protocol

Ask a question

Favorite

The algorithm shown in Figure 1 can also be applied to determine the statistical significance of MSAs created by other algorithms. Let us denote the MSA as A, the length of each sequence in A as K, and the number of sequences as N. All sequences from A are linked to produce sequence S₃ of length L = KN. Then, the PWM is calculated for A using Formula (2), transformed using Formulas (3) and (4), and applied to create the two-dimensional alignment for sequence S₃ using Formulas (5) and (6) and to calculate F(L, L). The statistical significance of A is then computed according to Formula (7).

However, the columns that have a sum of elements < N/2 should be excluded from A to eliminate redundant deletions in the calculation of F(L, L), whereas those with the sum > N/2 cannot be excluded since it would lead to an excessive number of insertions. Consequently, the number of columns became K′ ≤ K, resulting in a new alignment A′ (K′ is the length of each sequence in A′). To construct the PWM using A′, frequency matrix M(K′, 16) was first calculated using Formula (1) and then the PWM (designated as W_A’) was calculated using Formula (2). Formulas (3) and (4) were applied to transform the resulting matrix and obtain matrix WT_A_’, which was used to calculate F(L, L) (L = K′N) based on A’. For this, the sequence from A’ was merged with sequence S₄ with all the spaces preserved. At the same time, sequence S₅ containing column numbers {1, 2, …, K′} of the WT_A_’ matrix repeated N times was created. Then, we determined the sum of F₁ = F₁ + WT(s₅(i),n), where n = s₄(i − 1) + (s₄(i) − 1) × 4 was calculated for all i from 2 to L = K’N, for which s₄(i − 1) and s₄(i) were not gaps, whereas for those i for which s₄(i − 1) was a gap, the sum was calculated as F₂ = F₂ + E(s₅(i),s₄(i)). Matrix E was calculated from the WT_A_’ matrix using Formula (6). We also calculated F₃ = −k₁del, where k₁ was the number of gaps in alignment A’, and del was the insertion/deletion penalty (Formula (5)), as well as F₄ = −k₂del, where k₂ was the difference in the number of nucleotides between alignments A and A’. Finally, we calculated F(KN’, KN’) = F₅ = F₁ + F₂ − F₃ − F₄.

Weight matrix WT_A_’ is the image of alignment A’, for which statistical significance can be estimated based on the effectiveness of the alignment between the WT_A_’ matrix and random sequences. If the alignment is random, then matrix WT_A_’ would be random too and F₅ would be close to the value obtained for random sequences (Section 2.2).

Then, sequence S₄ was randomly shuffled to create 200 sequences and matrix WT_A_’ was included in the Q set as described in Section 2.5. Each of the 200 sequences were treated as described in Section 2.2, Section 2.3, Section 2.4. As a result, 200 maxV(n₁), each for a different random sequence, were obtained and used to calculate the mean maxV(n₁) and variance $\sqrt{D (\max V (n_{1}}))$ . Then, we calculated Z using Formula (7), where F₅ was used rather than maxV(n₁). The MSA constructed by different mathematical methods, including MAHDS, had the same algorithm for calculating Z, which allowed their comparison based on Z values (supplementary material 1).

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol