Constrained Baum–Welch algorithm

JL Jiefu Li
JL Jung-Youn Lee
LL Li Liao
request Request a Protocol
ask Ask a question
Favorite

The standard Baum–Welch algorithm is an Expectation-Maximization approach to maximizing likelihood when the system contains latent variables, which are the state sequences for hidden Markov models when training sequences are not labelled. Our constrained Baum–Welch algorithm (cBW) is similar to the standard Baum–Welch algorithm except that the training sequences are partially labelled, which imposes the constraints on the possible hidden state paths in calculating the expectation. Standard Baum–Welch algorithm is divided into E-step and M-step. The M-step of cBW algorithm is identical to standard Baum–Welch’s. The difference is the E-step, computing forward and backward matrices. The forward matrix α is of N×T, where N is the number of states and T is the sequence length. An element αi(t) is the probability of the observed sequence up to and including Ot, with the symbol Ot being emitted from state i. The backward matrix β is of N×T dimension has element βi(t) as the probability of the observed sequence from position t onto the end, with the symbol Ot being emitted from state i. The formulas of computing α and β are shown as following respectively.

Given the model θ=(π,A,B), where π is a N dimension vector, with πi being the probability that any hidden state path would start with state i. Then, the initial values of forward matrix α for one given training sequence O=(O1,,OT) is computed as follows.

After calculating the initial values of α, by dynamic programming, the remaining values at any position for any state are calculated recursively by summing over the possible state paths X=(X1,,XT), allowed by the model, that lead to the point whose α value is being calculated. However, since we now have partial labels for the training sequence O, care must be taken to satisfy the constraints at each position Ot imposed by the partial label, L(Ot)S{0}, where a value zero means no label available. Specifically,

In the above equation, the first case is when position Ot+1 is either unconstrained (0) or constrained to be state i by the partial label. In such a case, the α value is computed in the same way as the standard Baum–Welch algorithm, though the actual value can still be affected by partial labels at earlier positions via recursion. The second case is when the position t+1 is constrained by the partial label to be a state other than i. In this case, αi(t+1)=0. This latter case is what makes the algorithm different from the standard Baum–Welch algorithm in order to “honor” the partial labels. The backward matrix β is initialized as the following.

Then, similarly, a recursive procedure is applied for the remaining of backward matrix.

Note that, while the α is calculated the same way as the modified Forward algorithm in [12] but the β is calculated differently from their modified Backward algorithm. After the calculations of α and β, then we can calculate γ variable, where γi(t) is the probability of observing the training sequence O from all possible state paths that are allowed by hidden Markov model θ as constrained by the partial labels and go through state i at position t. γi(t) is computed as follows.

where the last equal sign holds because P(O|θ)=j=1Nαj(t)βj(t). The next step is to compute ξij(t), which is the probability of of observing the training sequence O from all possible state paths that are allowed by hidden Markov model θ as constrained by the partial labels and go through state i at positive t and transition to state j at position t+1:

Finally, with γ, ξ, the M-step is to update the initial probability π, every elements of the transition matrix A: aij, and every elements of the emission matrix B: bi(ok).

where IO(t)=ok stands for indicator function, which equals to 1 if O(t)=ok, and 0 otherwise. Then, for the case of multiple sequences, each sequences indexed by s, total number of sequences of m, The only changing is the updating of π,A, and B as follows.

The procedure above is repeated till either the smlog(P(Os|θ)) converge or reaching maximum iteration numbers set by the user. As mentioned in the Introduction section, a key difference between our method and [6] lies in the E-step for calculating the expected value for a given emission or transition. Our method handles the partial label constraints recursively for the α and β, whereas [6] calculates α and β without using the partial labels and only uses the partial labels in resetting γ at each partial labelled position independently, as if partial labels elsewhere would have no effect for the position being considered. Since E-step in Baum–Welch algorithm invokes forward and backward algorithms, which are essentially a dynamic programming to more efficiently calculate the likelihood: Pr(O|θ) = ΣXΓPr(O,X|θ) with Γ being the set of all hidden paths, and hence should give the same result when the likelihood is computed by exhaustively summing over probability for all possible hidden state paths. Therefore, we believe, the partial labels would restrict the possible hidden state paths, Pr(O|θ) = ΣXΓPr(O,X|θ) with Γ being the set of all hidden paths constrained by partial labels and such constraints should be handled recursively in the dynamic programming. Figure 1 shows an example for the forward/backward dynamic programming table construction. Another advantage of our method comparing with the method in [6] is that our training method can keep the topology of the initial transition and emission guesses for the model as standard Baum–Welch does. In other words, if prior knowledge is available for the model topology, our training method for partial label data can keep the knowledge to the end of training.

Example of constrained Baum–Welch’s forward/backward dynamic programming table construction Position 1–2 is the case of labelled position to unlabelled position. Position 3–4 is the case of unlabelled position to labelled position. Position 4–5 is the case of labelled position to labelled position. Dashed lines indicate state transitions. Generated by Google Drawings

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A