2.3. Comparison of Set Q with Sequence S Using Dynamic Programming

Eugene V. Korotkov; Yulia M. Suvorova; Dmitrii O. Kostenko; Maria A. Korotkova

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.3. Comparison of Set Q with Sequence S Using Dynamic Programming

EK Eugene V. Korotkov

YS Yulia M. Suvorova

DK Dmitrii O. Kostenko

MK Maria A. Korotkova

This method is extracted from research article: Genes (Basel), Jan 2021

Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome

DOI: 10.3390/genes12020135

Request a Protocol

Ask a question

Favorite

Then, we aligned sequence S with each of the matrices from set Q using the global alignment algorithm. The F value was calculated as:

where n = s(k) + 4(s(j) − 1)), i and j each ranges from 2 to L, and s(j) and s₁(i) are elements of sequences S and S₁, respectively. Parameter n reflects the fact that in matrix W, dinucleotides were taken into account. To determine n, the previous position (k), already included in the alignment, should be defined. It was calculated from the created transitions in matrix F, depending on the previous base in S and used to obtain W(s₁(i),n). If the previous base of sequence S is s(j − t), then k = j − t and n = s(j − t) + (s(j) − 1) × 4. Three cases should be considered. In the first, t = 1 corresponds to the movement along the main diagonal of matrix F and there is no deletion in sequence S in the alignment (Figure 3A). In the second, t > 1 corresponds to a deletion of t − 1 bases in sequence S (illustrated in Figure 3B for t = 2). Finally, deletions may occur simultaneously in both sequences S and S₁, which correspond to deletions of columns in matrix W. If the previous symbol in sequence S₁ has the number i − 1, then there is no deletion, but if this number is i − t (t > 1), then there is a deletion of t − 1 bases in sequence S₁. For these transitions, we did not consider the correlations of adjacent bases and took n = s(j). Rather than matrix W(s₁(i),n), we used matrix W₁(s₁(i),s(j)):

Three variants of the path from (i − 1, j − 1) to (i, j) to calculate F(i, j) using Formula (5) when t < 3. (A) The transition to (i − 1, j − 1) is from (i − 2, j − 2). Then, the previous position in the alignment is (i − 1, j − 1), which corresponds to t = 1. In this case, a pair of bases s(j − 1) = t and s(j) = g is chosen in sequence S and k = j − 1, whereas s(k) = s(j − 1) = t. Then, n = s(k) + 4(s(j) − 1)) = s(j − 1) + 4(s(j) − 1)) = 2 + 4 × (4 − 1) = 14, which means that we are using W(s₁(i),14). (B) The transition to (i − 1, j − 1) is from (i − 1, j − 2) and that to (i − 1, j − 2) is from (i − 2, j − 3). In this case, t = 2, k = j − 2, and s(k) = s(j − 2) = 2, n = (k) + 4(s(j) − 1)) = s(j − 2) + 4(s(j) − 1)) = 1 + 4 × (4 − 1) = 13, and we use W(s₁(i),13). (C) The transition to (i − 1, j − 1) is from (i − 2, j − 1) and that to (i − 2, j − 1) is from (i-3, j − 2). In this case, t = 2 and one symbol is deleted from the sequence. Then, we do not take into account the base correlation in sequence S and use matrix W₁(s₁(i),s(j)) and n = s(j) = 4.

In this case, the correlation of adjacent bases is not considered, which is quite acceptable when the number of deletions is relatively small (illustrated in Figure 3C for t = 2).

The zero row and column of matrix F were filled with negative numbers, F(0, j) and F(i, 0) were 0 for i and j ranging from 1 to L, respectively, and F(0, 0), F(1, 0), …, F(2, 0) were also equal to 0. Matrix E(x, n) was used to define the first column and row of matrix F. The insertion/deletion penalty value (del = 25.0) was selected based on our earlier work [25]. The reverse transition matrix was filled along with matrix F. Therefore, we aligned sequences S₁ and S using the reverse transition matrix and determined F(L, L). The alignment of S₁ and S was obtained for all matrices from the Q set. As a result, vector V(i) (i = 1, …, n₁) contained F(L, L) for each matrix.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol