The standard Baum–Welch algorithm is an Expectation-Maximization approach to maximizing likelihood when the system contains latent variables, which are the state sequences for hidden Markov models when training sequences are not labelled. Our constrained Baum–Welch algorithm (cBW) is similar to the standard Baum–Welch algorithm except that the training sequences are partially labelled, which imposes the constraints on the possible hidden state paths in calculating the expectation. Standard Baum–Welch algorithm is divided into E-step and M-step. The M-step of cBW algorithm is identical to standard Baum–Welch’s. The difference is the E-step, computing forward and backward matrices. The forward matrix is of , where N is the number of states and T is the sequence length. An element is the probability of the observed sequence up to and including , with the symbol being emitted from state i. The backward matrix is of dimension has element as the probability of the observed sequence from position t onto the end, with the symbol being emitted from state i. The formulas of computing and are shown as following respectively.
Given the model , where is a N dimension vector, with being the probability that any hidden state path would start with state i. Then, the initial values of forward matrix for one given training sequence is computed as follows.
After calculating the initial values of , by dynamic programming, the remaining values at any position for any state are calculated recursively by summing over the possible state paths , allowed by the model, that lead to the point whose value is being calculated. However, since we now have partial labels for the training sequence O, care must be taken to satisfy the constraints at each position imposed by the partial label, , where a value zero means no label available. Specifically,
In the above equation, the first case is when position is either unconstrained (0) or constrained to be state i by the partial label. In such a case, the value is computed in the same way as the standard Baum–Welch algorithm, though the actual value can still be affected by partial labels at earlier positions via recursion. The second case is when the position is constrained by the partial label to be a state other than i. In this case, . This latter case is what makes the algorithm different from the standard Baum–Welch algorithm in order to “honor” the partial labels. The backward matrix is initialized as the following.
Then, similarly, a recursive procedure is applied for the remaining of backward matrix.
Note that, while the is calculated the same way as the modified Forward algorithm in [12] but the is calculated differently from their modified Backward algorithm. After the calculations of and , then we can calculate variable, where is the probability of observing the training sequence O from all possible state paths that are allowed by hidden Markov model as constrained by the partial labels and go through state i at position t. is computed as follows.
where the last equal sign holds because . The next step is to compute , which is the probability of of observing the training sequence O from all possible state paths that are allowed by hidden Markov model as constrained by the partial labels and go through state i at positive t and transition to state j at position :
Finally, with , , the M-step is to update the initial probability , every elements of the transition matrix : , and every elements of the emission matrix : .
where stands for indicator function, which equals to 1 if , and 0 otherwise. Then, for the case of multiple sequences, each sequences indexed by s, total number of sequences of m, The only changing is the updating of ,, and as follows.
The procedure above is repeated till either the converge or reaching maximum iteration numbers set by the user. As mentioned in the Introduction section, a key difference between our method and [6] lies in the E-step for calculating the expected value for a given emission or transition. Our method handles the partial label constraints recursively for the and , whereas [6] calculates and without using the partial labels and only uses the partial labels in resetting at each partial labelled position independently, as if partial labels elsewhere would have no effect for the position being considered. Since E-step in Baum–Welch algorithm invokes forward and backward algorithms, which are essentially a dynamic programming to more efficiently calculate the likelihood: = with being the set of all hidden paths, and hence should give the same result when the likelihood is computed by exhaustively summing over probability for all possible hidden state paths. Therefore, we believe, the partial labels would restrict the possible hidden state paths, = with being the set of all hidden paths constrained by partial labels and such constraints should be handled recursively in the dynamic programming. Figure 1 shows an example for the forward/backward dynamic programming table construction. Another advantage of our method comparing with the method in [6] is that our training method can keep the topology of the initial transition and emission guesses for the model as standard Baum–Welch does. In other words, if prior knowledge is available for the model topology, our training method for partial label data can keep the knowledge to the end of training.
Example of constrained Baum–Welch’s forward/backward dynamic programming table construction Position 1–2 is the case of labelled position to unlabelled position. Position 3–4 is the case of unlabelled position to labelled position. Position 4–5 is the case of labelled position to labelled position. Dashed lines indicate state transitions. Generated by Google Drawings
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.