Multiset sparse partial least squares path modeling

AC Attila Csala
AZ Aeilko H. Zwinderman
MH Michel H. Hof
request Request a Protocol
ask Ask a question
Favorite

Multiset sparse Partial Least Squares path modeling (msPLS) is a multivariate approach for modeling the relationship between Q related data sources (X1,...,Xq,...,XQ), with the help of latent variables (LVs). Each data source contains pq number of manifest variables (MVs), measured on the same n samples (i.e. Xqn×pq), each data source is assigned to its corresponding LV (ζ1,...,ζq,...,ζQ). The LVs are linear combinations of their MVs (ζq=Xqwq, where ζqn×1 and wqpq×1). The relationship between the data sources is encoded in a connectivity matrix, like in Partial Least Squares path modeling (PLS-PM), and modelled through a multiple regression model between the LVs;

where m=1Mqζmq denotes the sum of Mq LVs that are explanatory for ζq, θqm is the coefficient capturing the effect of the mth ζmq on ζq, and vq is white noise, following the notation of [22, 24] for PLS-PM. A full description for the PLS-PM algorithm can be found in [24] (Algorithm 6). The weight vectors wq are estimated as

or as

depending on the mode of the regression. PLS-PM denotes Eq. (2) as Mode A and Eq. (3) as Mode B regression. For msPLS, Mode A (i.e. multiple univariate regression) is used for the weight vectors of MVs that do not have any response MVs, and Mode B (i.e. multivariate regression) is used for the weight vectors of MVs that do have response MVs. The descriptions of the objective functions of PLS-PM can be found in [22, 24] and the objective function for msPLS is given by Eq. (5) in the “General case” section.

In a high dimensional setting (i.e. pq>>n), the covariance matrix of Xq in Eq. (2) is non-invertible. To solve this problem, we propose to replace Eq. (2) with Elastic Net (ENet) penalization. Replacing the ordinary least square estimator in Eq. (2) with ENet penalisation has two advantages; not only we overcome the multicollinearity issue encountered in a high dimensional setting, but ENet also enforces sparse variable selection, which ease the interpretability of the final model. Equation (2) then becomes

where λ1 denotes the LASSO penalty and λ2 denotes the Ridge penalty parameters [27].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A