Constrained Baum–Welch algorithm

Jiefu Li; Jung-Youn Lee; Li Liao

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Constrained Baum–Welch algorithm

JL Jiefu Li

JL Jung-Youn Lee

LL Li Liao

This method is extracted from research article: BMC Bioinformatics, Mar 2021

A new algorithm to train hidden Markov models for biological sequences with partial labels

DOI: 10.1186/s12859-021-04080-0

Request a Protocol

Ask a question

Favorite

The standard Baum–Welch algorithm is an Expectation-Maximization approach to maximizing likelihood when the system contains latent variables, which are the state sequences for hidden Markov models when training sequences are not labelled. Our constrained Baum–Welch algorithm (cBW) is similar to the standard Baum–Welch algorithm except that the training sequences are partially labelled, which imposes the constraints on the possible hidden state paths in calculating the expectation. Standard Baum–Welch algorithm is divided into E-step and M-step. The M-step of cBW algorithm is identical to standard Baum–Welch’s. The difference is the E-step, computing forward and backward matrices. The forward matrix $α$ is of $N \times T$ , where N is the number of states and T is the sequence length. An element $α_{i} (t)$ is the probability of the observed sequence up to and including $O_{t}$ , with the symbol $O_{t}$ being emitted from state i. The backward matrix $β$ is of $N \times T$ dimension has element $β_{i} (t)$ as the probability of the observed sequence from position t onto the end, with the symbol $O_{t}$ being emitted from state i. The formulas of computing $α$ and $β$ are shown as following respectively.

Given the model $θ = (π, A, B)$ , where $π$ is a N dimension vector, with $π_{i}$ being the probability that any hidden state path would start with state i. Then, the initial values of forward matrix $α$ for one given training sequence $O = (O_{1}, \dots, O_{T})$ is computed as follows.

After calculating the initial values of $α$ , by dynamic programming, the remaining values at any position for any state are calculated recursively by summing over the possible state paths $X = (X_{1}, \dots, X_{T})$ , allowed by the model, that lead to the point whose $α$ value is being calculated. However, since we now have partial labels for the training sequence O, care must be taken to satisfy the constraints at each position $O_{t}$ imposed by the partial label, $L (O_{t}) \in S \cup {0}$ , where a value zero means no label available. Specifically,

In the above equation, the first case is when position $O_{t + 1}$ is either unconstrained (0) or constrained to be state i by the partial label. In such a case, the $α$ value is computed in the same way as the standard Baum–Welch algorithm, though the actual value can still be affected by partial labels at earlier positions via recursion. The second case is when the position $t + 1$ is constrained by the partial label to be a state other than i. In this case, $α_{i} (t + 1) = 0$ . This latter case is what makes the algorithm different from the standard Baum–Welch algorithm in order to “honor” the partial labels. The backward matrix $β$ is initialized as the following.

Then, similarly, a recursive procedure is applied for the remaining of backward matrix.

Note that, while the $α$ is calculated the same way as the modified Forward algorithm in [12] but the $β$ is calculated differently from their modified Backward algorithm. After the calculations of $α$ and $β$ , then we can calculate $γ$ variable, where $γ_{i} (t)$ is the probability of observing the training sequence O from all possible state paths that are allowed by hidden Markov model $θ$ as constrained by the partial labels and go through state i at position t. $γ_{i} (t)$ is computed as follows.

where the last equal sign holds because $P (O | θ) = \sum_{j = 1}^{N} α_{j} (t) β_{j} (t)$ . The next step is to compute $ξ_{ij} (t)$ , which is the probability of of observing the training sequence O from all possible state paths that are allowed by hidden Markov model $θ$ as constrained by the partial labels and go through state i at positive t and transition to state j at position $t + 1$ :

Finally, with $γ$ , $ξ$ , the M-step is to update the initial probability $π^{*}$ , every elements of the transition matrix $A^{*}$ : $a_{ij}^{*}$ , and every elements of the emission matrix $B^{*}$ : $b_{i}^{*} (o_{k})$ .

where $I_{O (t) = o_{k}}$ stands for indicator function, which equals to 1 if $O (t) = o_{k}$ , and 0 otherwise. Then, for the case of multiple sequences, each sequences indexed by s, total number of sequences of m, The only changing is the updating of $π^{*}$ , $A^{*}$ , and $B^{*}$ as follows.

The procedure above is repeated till either the $\sum_{s}^{m} l o g (P (O^{s} | θ))$ converge or reaching maximum iteration numbers set by the user. As mentioned in the Introduction section, a key difference between our method and [6] lies in the E-step for calculating the expected value for a given emission or transition. Our method handles the partial label constraints recursively for the $α$ and $β$ , whereas [6] calculates $α$ and $β$ without using the partial labels and only uses the partial labels in resetting $γ$ at each partial labelled position independently, as if partial labels elsewhere would have no effect for the position being considered. Since E-step in Baum–Welch algorithm invokes forward and backward algorithms, which are essentially a dynamic programming to more efficiently calculate the likelihood: $P r (O | θ)$ = $Σ_{X \in Γ} P r (O, X | θ)$ with $Γ$ being the set of all hidden paths, and hence should give the same result when the likelihood is computed by exhaustively summing over probability for all possible hidden state paths. Therefore, we believe, the partial labels would restrict the possible hidden state paths, $P r (O | θ)$ = $Σ_{X \in Γ^{'}} P r (O, X | θ)$ with $Γ^{'}$ being the set of all hidden paths constrained by partial labels and such constraints should be handled recursively in the dynamic programming. Figure 1 shows an example for the forward/backward dynamic programming table construction. Another advantage of our method comparing with the method in [6] is that our training method can keep the topology of the initial transition and emission guesses for the model as standard Baum–Welch does. In other words, if prior knowledge is available for the model topology, our training method for partial label data can keep the knowledge to the end of training.

Example of constrained Baum–Welch’s forward/backward dynamic programming table construction Position 1–2 is the case of labelled position to unlabelled position. Position 3–4 is the case of unlabelled position to labelled position. Position 4–5 is the case of labelled position to labelled position. Dashed lines indicate state transitions. Generated by Google Drawings

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol