Principal component analysis (PCA) and ICA are forms of blind source separation techniques that seek to unmix data to identify the underlying and unknown sources (13). Here, the goal of using these feature extraction techniques was to isolate and remove CME that appeared as spatially correlated noise from the GPS data and to separate it from the hydrologic load signal. PCA is a statistical approach that uses an orthogonal linear transformation to reproject the data from a set of possibly correlated variables to a set of linear uncorrelated variables of maximum variance, called principal components (PCs). The PCs, which represent the temporal basis functions of the data, were generated by projecting the data onto a set of orthonormal basis vectors that were derived from an eigenvalue decomposition of the data covariance matrix. These eigenvectors describe the spatial pattern of the data and are referred to here as spatial responses. These are ordered by the percentage of variance explained, where the first component denotes the source of highest variance and contributes the most motion to the GPS network. Because the covariance matrix of the data matrix (n GPS stations by t time samples) is full rank, the eigendecomposition provides n set of PCs and n set of spatial responses. However, because most GPS time series do not follow a Gaussian distribution (that is, the underlying process is not Gaussian) (15) and PCA uses second-order statistics (variance) and assumes that the underlying sources are Gaussian, it is not an optimal method and is susceptible to mixing sources across different components.

ICA is similar in concept to PCA; however, it finds sources of maximum independence instead of minimum correlation—the former a stronger condition as not all functions that are uncorrelated are independent. ICA is advantageous over PCA as it uses higher-order statistics (fourth order; for example, negentropy in this case) and assumes that the underlying components are non-Gaussian and statistically independent.

To perform the ICA analysis, we organized the data matrix (Xn×t) into n rows and t columns, with each element representing displacement in a certain direction (for example, vertical component of GPS) measured by the nth GPS station. ICA was applied separately to the vertical, east, and north components. For each row, we subtracted the sample mean and then whitened the data. As shown in Eq. 1, the observed data matrix was assumed to be some transformation (Qn×r) from a set of r unknown time-varying sources (Sr×t)Embedded Image(1)

The task of ICA is to determine the unmixing matrix (W = Q−1) so that we could determine the underlying the sources (S) from the data (X). In PCA, the Q matrix is a linear orthogonal transformation that maximizes the variance of the rows of S, while in ICA it uses a similar generalized form of Eq. 1, but Q is a nonlinear transformation that maximizes the statistical independence of the rows of S. Here, we used the reconstruction ICA (rICA) approach (9) to estimate the unknown sources. This differs from other ICA methods such as fastICA (35) by swapping the orthonormality constraint applied to W (that is, WWT = I), with a reconstruction penalty term added explicitly to the objective function, giving the benefit of using unconstrained solvers [see equation 2 of (9)].

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.