Multiset sparse partial least squares path modeling

Attila Csala; Aeilko H. Zwinderman; Michel H. Hof

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Multiset sparse partial least squares path modeling

AC Attila Csala

AZ Aeilko H. Zwinderman

MH Michel H. Hof

This method is extracted from research article: BMC Bioinformatics, Jan 2020

Multiset sparse partial least squares path modeling for high dimensional omics data analysis

DOI: 10.1186/s12859-019-3286-3

Request a Protocol

Ask a question

Favorite

Multiset sparse Partial Least Squares path modeling (msPLS) is a multivariate approach for modeling the relationship between Q related data sources (X₁,...,X_q,...,X_Q), with the help of latent variables (LVs). Each data source contains p_q number of manifest variables (MVs), measured on the same n samples (i.e. $X_{q} \in ℝ^{n \times p_{q}}$ ), each data source is assigned to its corresponding LV (ζ₁,...,ζ_q,...,ζ_Q). The LVs are linear combinations of their MVs ( $ζ_{q} = X_{q} w_{q}$ , where $ζ_{q} \in ℝ^{n \times 1}$ and $w_{q} \in ℝ^{p_{q} \times 1}$ ). The relationship between the data sources is encoded in a connectivity matrix, like in Partial Least Squares path modeling (PLS-PM), and modelled through a multiple regression model between the LVs;

where $\sum_{m = 1}^{M_{q}} ζ_{m \to q}$ denotes the sum of M_q LVs that are explanatory for ζ_q, θ_qm is the coefficient capturing the effect of the mth ζ_m→q on ζ_q, and v_q is white noise, following the notation of [22, 24] for PLS-PM. A full description for the PLS-PM algorithm can be found in [24] (Algorithm 6). The weight vectors w_q are estimated as

or as

depending on the mode of the regression. PLS-PM denotes Eq. (2) as Mode A and Eq. (3) as Mode B regression. For msPLS, Mode A (i.e. multiple univariate regression) is used for the weight vectors of MVs that do not have any response MVs, and Mode B (i.e. multivariate regression) is used for the weight vectors of MVs that do have response MVs. The descriptions of the objective functions of PLS-PM can be found in [22, 24] and the objective function for msPLS is given by Eq. (5) in the “General case” section.

In a high dimensional setting (i.e. p_q>>n), the covariance matrix of X_q in Eq. (2) is non-invertible. To solve this problem, we propose to replace Eq. (2) with Elastic Net (ENet) penalization. Replacing the ordinary least square estimator in Eq. (2) with ENet penalisation has two advantages; not only we overcome the multicollinearity issue encountered in a high dimensional setting, but ENet also enforces sparse variable selection, which ease the interpretability of the final model. Equation (2) then becomes

where λ₁ denotes the LASSO penalty and λ₂ denotes the Ridge penalty parameters [27].

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol