2.2. Feature Extraction Using PPCA

JW Jiang Wu
YJ Yanju Ji
LZ Ling Zhao
MJ Mengying Ji
ZY Zhuang Ye
SL Suyi Li
request Request a Protocol
ask Ask a question
Favorite

After the preprocessing stage, the SELDI-TOF-MS data set was still highly dimensional. Extracting features by using dimension reduction techniques not only simplifies the structure of the prediction model but also improves the speed of training and testing. PCA is a commonly used dimension reduction technique based on the minimum variance principle of reconstruction. What is more, it uses the small amount of principle components to replace the massive data. However, PCA is lack of probabilistic model structure and highly order statistics. PPCA, proposed by Tipping and Bishop [16], restricts the factor loading matrix with a noise variance estimation using the principle components ignored by the traditional PCA in the latent variable model and then obtains the optimal probability model through the parameters estimated by the expectation-maximization (EM) algorithm. Consequently, PPCA can find the direction of the principal components from the high-dimensional data more effectively and can obtain the outstanding feature extraction more efficiently.

Suppose that the dimension of an observation data set {S n,  n = 1,2,…, N} is d and the number of samples is N. For one sample, through the latent variable model, the relationship between the observation data S and the latent variable X can be expressed as

where W is a d × q factor loading matrix, X is a q-dimensional latent variable, μ = (1/N)∑n=1 N S n, is a nonzero mean, ε is error and assume X ~ N(0, I) and ε ~ N(0, σ 2 I), and then we can obtain the probability distribution of S under the condition of X through (1) as follows:

If the prior probability model of X conforms to Gaussian distribution

then the probability distribution of S can be expressed as

where C = WW T + σ 2 I is a d × d matrix. By using Bayes rule, we can derive the posterior probability distribution of X from S:

where M = W T W + σ 2 I is a q × q matrix. Under this model, the Log-likelihood function of S can be expressed as

where U = (1/N)∑n=1 N(S nμ)(S nμ)T is the covariance matrix of the observations, and then we can obtain the maximum likelihood estimates through the EM algorithm:

where W is the old value of the parameter matrix and W~ is the revised estimates calculated from (7). We bring the parameters obtained from (7) and (8) into (1) to derive the latent variable X~n which is the dimensionality reduction form of the observations S n:

From (9), we can reconstruct the observation data S~n via X~n:

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A