2.3. Comparison of Set Q with Sequence S Using Dynamic Programming

EK Eugene V. Korotkov
YS Yulia M. Suvorova
DK Dmitrii O. Kostenko
MK Maria A. Korotkova
request Request a Protocol
ask Ask a question
Favorite

Then, we aligned sequence S with each of the matrices from set Q using the global alignment algorithm. The F value was calculated as:

where n = s(k) + 4(s(j) − 1)), i and j each ranges from 2 to L, and s(j) and s1(i) are elements of sequences S and S1, respectively. Parameter n reflects the fact that in matrix W, dinucleotides were taken into account. To determine n, the previous position (k), already included in the alignment, should be defined. It was calculated from the created transitions in matrix F, depending on the previous base in S and used to obtain W(s1(i),n). If the previous base of sequence S is s(jt), then k = jt and n = s(jt) + (s(j) − 1) × 4. Three cases should be considered. In the first, t = 1 corresponds to the movement along the main diagonal of matrix F and there is no deletion in sequence S in the alignment (Figure 3A). In the second, t > 1 corresponds to a deletion of t − 1 bases in sequence S (illustrated in Figure 3B for t = 2). Finally, deletions may occur simultaneously in both sequences S and S1, which correspond to deletions of columns in matrix W. If the previous symbol in sequence S1 has the number i − 1, then there is no deletion, but if this number is it (t > 1), then there is a deletion of t − 1 bases in sequence S1. For these transitions, we did not consider the correlations of adjacent bases and took n = s(j). Rather than matrix W(s1(i),n), we used matrix W1(s1(i),s(j)):

Three variants of the path from (i − 1, j − 1) to (i, j) to calculate F(i, j) using Formula (5) when t < 3. (A) The transition to (i − 1, j − 1) is from (i − 2, j − 2). Then, the previous position in the alignment is (i − 1, j − 1), which corresponds to t = 1. In this case, a pair of bases s(j − 1) = t and s(j) = g is chosen in sequence S and k = j − 1, whereas s(k) = s(j − 1) = t. Then, n = s(k) + 4(s(j) − 1)) = s(j − 1) + 4(s(j) − 1)) = 2 + 4 × (4 − 1) = 14, which means that we are using W(s1(i),14). (B) The transition to (i − 1, j − 1) is from (i − 1, j − 2) and that to (i − 1, j − 2) is from (i − 2, j − 3). In this case, t = 2, k = j − 2, and s(k) = s(j − 2) = 2, n = (k) + 4(s(j) − 1)) = s(j − 2) + 4(s(j) − 1)) = 1 + 4 × (4 − 1) = 13, and we use W(s1(i),13). (C) The transition to (i − 1, j − 1) is from (i − 2, j − 1) and that to (i − 2, j − 1) is from (i-3, j − 2). In this case, t = 2 and one symbol is deleted from the sequence. Then, we do not take into account the base correlation in sequence S and use matrix W1(s1(i),s(j)) and n = s(j) = 4.

In this case, the correlation of adjacent bases is not considered, which is quite acceptable when the number of deletions is relatively small (illustrated in Figure 3C for t = 2).

The zero row and column of matrix F were filled with negative numbers, F(0, j) and F(i, 0) were 0 for i and j ranging from 1 to L, respectively, and F(0, 0), F(1, 0), …, F(2, 0) were also equal to 0. Matrix E(x, n) was used to define the first column and row of matrix F. The insertion/deletion penalty value (del = 25.0) was selected based on our earlier work [25]. The reverse transition matrix was filled along with matrix F. Therefore, we aligned sequences S1 and S using the reverse transition matrix and determined F(L, L). The alignment of S1 and S was obtained for all matrices from the Q set. As a result, vector V(i) (i = 1, …, n1) contained F(L, L) for each matrix.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A