2.2. PACA: Phenotype-Aware Component Analysis

AG Aditya Gorla
SS Sriram Sankararaman
EB Esteban Burchard
JF Jonathan Flint
NZ Noah Zaitlen
ER Elior Rahmani
request Request a Protocol
ask Ask a question
Favorite

Using a singular value decomposition we can consider an alternative formulation for the model in (1)-(2):

where each of U0m×k0, VX0n1×k0, VY0n0×k0, U1m×k1, and VX1n1×k0 forms an orthonormal basis. Under this presentation, we are interested in learning VX1, a linearly-transformed surrogate for ZX1 in Eq. (1). Given the directions of the shared sources of variation U0, note that

which established a way to remove the expected effect it induces on X (i.e., the signals in X that are coming from sources of variation that exist in controls). Our PACA algorithm therefore estimates and removes the effects of U0 on X, followed by PCA for capturing VX1. Our main focus is thus estimating U0.

Since U0 is found in both X, Y, we can estimate it by employing a canonical correlation analysis (CCA) [59]. Importantly, unlike in a typical application of CCA, where we wish to find linear transformations of the features that yield the highest correlation between the samples in two datasets (i.e., vectors in the samples space), here, we seek linear transformations of the samples that provide vectors in the features space m. Specifically, we find the first pair of canonical variables by solving:

Setting u^10=Xa^ yields the representation in X of the strongest direction of shared variation across X and Y. Let SXY be the empirical sample cross-covariance of the matrices X and Y, the solution for a^, b^ is known to be given by the eigenvector that corresponds to the top eigenvalue of SXX1SXYSYY1SYX and the eigenvector that corresponds to the top eigenvalue of SYY1SYXSXX1SXY, respectively [59]. This procedure can be repeated iteratively by restricting the vectors u^r0 to be orthogonal to the previous canonical variables u^10,,u^r10 (and similarly for the variables of Y). Eventually, the collective of these vectors U^0 can be used for removing the shared sources of variation from X following Eq (6); see a summary of PACA in Algorithm 1. As we discuss next, choosing the number of components r to be used can be informed by the structure induced in X and Y by the shared variation.

We note that the CCA step in PACA imposes the limitation m ≥ max(n0, n1). In practice, there may be datasets in which this condition does not hold. In such cases, we can in principle apply PACA on a subset of the samples, however, this is clearly sub-optimal to exclude data. Instead, we can use a randomized version of PACA that can operate under the setup m < max(n0, n1) by using multiple random subsets of the data for estimating the full sample covariance matrix of X (while accounting for the shared sources of variation between X and Y). See Supplementary Methods and Supplementary Algorithm 2 for details.

In typical settings, the parameter k0 of the dimension of U0 is unknown. We may estimate it by setting an estimate k^0 to be the largest value r that reveals a significant correlation between the correlated subspace of X and Y induced by a^r, b^r; this can be achieved, for example using Bartlett’s Chi-squared test or Rao’s approximate F statistic [60]. However, in practice, we found that in the data we analyzed k0 seems to be very large (typically a few hundreds). Hence, this approach can lead to over-regularization and reduction in statistical power due to the unnecessary removal of a large number of axes of shared variation. In order to see why, recall that our goal is to be able to systematically target variation of interest that is typically weaker than the dominant sources of variation in the data. While standard dimensionality reduction tools cannot achieve this when applied to the original data, we expected them to be effective in revealing the target variation if applied to a residualized (adjusted) version of the data in which all sources of variation that are stronger than the target variation are removed. In other words, there is no need to estimate the true k0 and remove a structure of this dimension from the data prior to applying dimensionality reduction. Instead, we wish to find what is the minimal number of axes of variation k that we need to remove in order to reveal variation that is unique to cases. Below, we provide a brief description of the algorithm; see Algorithm 2 for complete details.

Given all shared axes of variation learned from the data using the procedure in Eq. (7), we learn k ∈ {1, …, min(n0, n1) using binary search as follows. At each given candidate k, we evaluate the variance of the top PC we calculate from the residualized X accounting for the top k shared axes of variation; this same PC, which may reflect cases-specific variation, is also used for evaluating the variance it can explain in the residualized Y matrix. Comparing these variances to ”null” variances obtained while permuting the loadings of the PC allows us to call whether accounting for k axes of shared direction is sufficient to detect structure that is unique to X. This evaluation can lead to one of four scenarios, based on which we decide on the next partition of the binary search: there can be (i) significant variation in cases but not in controls, (ii) significant variation in cases and controls, (iii) no variation in cases or controls, or (iv) significant variation in controls but not in cases.

Scenarios (i) and (iii) lead us to consider lower values of k due to possible over correction. The former means we revealed cases-specific variation, yet, we may be able to refine the signal if we can identify it using lower k, and the latter indicates that all the structure in the data has been removed. Scenario (iv) indicates a violation of our model assumption, which leads to termination, and lastly, scenario (ii) indicates residual shared variation, which suggests increasing k. Empirically, we found that considering a threshold γ on the maximum ratio between the variance of X’s PC and the variance it explain in Y under scenario (ii) is more stable and improves performance. Specifically, if the ratio between the variances of X, Y is greater than a predefined threshold then we handle this case similarly to case (i), and otherwise, we assume under-correction and consider a larger k. Throughout our analysis, unless stated otherwise, we used γ = 10.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A