Canonical Correlation Analysis (CCA) is a common statistical learning method for analyzing pairwise data. It learns a projection for both representations such that they are maximally correlated in the dimensionality-reduced space. Suppose with n samples and p features and with n samples and q features represent two omics data from a single cancer and their columns of X and Y are centered and scaled with zero mean and unit variance. Then, the CCA model can be written as follows:
Suppose XT X = I and YT Y = I, where I is the identity matrix. Then the above model reduces to:
which was called as the diagonal CCA whose performance is usually better than the traditional CCA in high-dimensional data [31, 32]. However, the classical CCA leads to non-sparse canonical vectors. It is difficult to select features and interpret in biology. To this end, a large number of sparse CCA models have been proposed to obtain sparse canonical vectors by using different penalty functions [16, 33–37]. Specifically, a sparse CCA (SCCA) with ℓ0-norm constraint [35] can be formulated into the following optimization problem:
where ku and kv are two parameters to control the sparsity of canonical vectors (u and v), and ‖u‖0 is the number of non-zero elements in the u.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.