Stepwise iterative maximum likelihood method

Alok Sharma; Daichi Shigemizu; Keith A. Boroevich; Yosvany López; Yoichiro Kamatani; Michiaki Kubo; Tatsuhiko Tsunoda

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Stepwise iterative maximum likelihood method

AS Alok Sharma

DS Daichi Shigemizu

KB Keith A. Boroevich

YL Yosvany López

YK Yoichiro Kamatani

MK Michiaki Kubo

TT Tatsuhiko Tsunoda

This method is extracted from research article: BMC Bioinformatics, Aug 2016

Stepwise iterative maximum likelihood clustering approach

DOI: 10.1186/s12859-016-1184-5

Request a Protocol

Ask a question

Favorite

In this section, we describe our proposed method. This method seeks the most optimal partitions in an iterative way. We begin with an initial partition of data and shift a sample from one partition to another partition, and test if such a shift improves the overall log-likelihood. A simple illustration of SIML is given in Fig. 1.

An illustration of stepwise iterative maximum likelihood method using a c = 2 cluster case. In this illustration, two clusters An external file that holds a picture, illustration, etc.
Object name is 12859_2016_1184_Figc_HTML.gif and are given with likelihood functions L₁ and L₂, respectively. The center of clusters are depicted by μ ₁ and μ ₂ (shown as ‘+’ inside two clusters). Initial total likelihood is L_old which is the sum of two likelihood functions (L₁ + L₂). A sample x ∈ An external file that holds a picture, illustration, etc.
Object name is 12859_2016_1184_Fige_HTML.gif is checked for grouping. It is advantageous to shift sample x to cluster only if the new likelihood (L_new = L ₁^* + L ₂^*) is higher than the old likelihood; i.e., L _new > L _old

If we define class-based log-likelihood of two clusters χ_i and χ_j as

and

then we would be interested in knowing how the class-based log-likelihood functions (referred as log-likelihood function hereafter) change if a sample is shifted from χ_i to χ_j. In order to know this, let us define mean and covariance of χ_i and χ_j as μ_i and μ_j, and, as Σ_i and Σ_j, respectively. The following equations describe mean and covariance:

and

n_i and n_j are number of samples in χ_i and χ_j, respectively. If the component density is normal and let P(ω_i) = n_i/n (where n is the total number of samples) then Eqs. 9 and 10 can be written as

where tr() is a trace function. Since $t r [Σ_{i}^{- 1} \sum_{x \in χ_{i}} (x - μ_{i}) {(x - μ_{i})}^{T}] = t r (n_{i} I_{d \times d}) = n_{i} d$ we can write L_i as

Similarly, we can write L_j as

and the total log-likelihood for c clusters can be written as

where L_k is from Eq. 15 or 16.

If a sample $\hat{x} \in χ_{i}$ is shifted to χ_j, then the mean and covariance will change as follows (from Eqs. 11, 12, 13 and 14):

where μ_i^*, μ_j^*, Σ_i^* and Σ_j^* are means and covariances after the shift.

In order to find the change in log-likelihood functions L_i and L_j, we first introduce the following Lemma.

Lemma 1 If a sample $\hat{x} \in χ_{i}$ is shifted to cluster χ_j and the changed covariance of χ_j is defined as $Σ_{j}^{*} = \frac{n_{j}}{n_{j} + 1} Σ_{j} + \frac{n_{j}}{{(n_{j} + 1)}^{2}} (\hat{x} - μ_{j}) {(\hat{x} - μ_{j})}^{T}$ then the determinant of Σ_j^* can be given as $| Σ_{j}^{*} | = {(\frac{n_{j}}{n_{j} + 1})}^{d} | Σ_{j} | (1 + \frac{1}{n_{j} + 1} {(\hat{x} - μ_{j})}^{T} Σ_{j}^{- 1} (\hat{x} - μ_{j}))$ .

Proof By taking determinant of Σ_j^*, we get

since for m × m square matrices |AB| = |A||B| and for a scalar c, |cA| = c^m|A|. We can write Eq. L1 as

From Sylvester’s determinant theorem, rectangular matrices A ∈ ℝ^m × n and B ∈ ℝ^n × m in |I_m × m + AB| is equal to |I_n × n + BA|. Therefore, we can write

since |c| = c.

Substituting right hand side of Eq. L3 in Eq. L2 proves the Lemma.

As similar to Lemma 1, the determinant of the change in covariance of χ_i can be written as

We can now observe the change in L_j (Eq. 16) due to the shift of a sample $\hat{x}$ from χ_i to χ_j as

From Lemma 1 and Eq. 16, we can rewrite Eq. 23 after doing algebraic manipulation as

where ΔL_j is given by

and constant C is given by

In a similar manner, change in L_i can be obtained by using Eqs. 15 and 22 as

where ΔL_i is given by

and C is same as of Eq. 26.

By adding Eqs. 24 and 27, we can get the change in total log-likelihood (L_tot^*) since there is no change in any other clusters apart from χ_i to χ_j; i.e., from Eqs. 17, 24 and 27 we have

where ΔL_tot = ΔL_j − ΔL_i. Therefore, the shift of a sample $\hat{x}$ is advantageous if ΔL_tot > 0. This will give the following algorithm (Table 1):

Stepwise iterative maximum likelihood method procedure

The following sections discuss the characteristic of the SIML method.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol