The principal component analysis is a statistical procedure mostly used for reducing the dimensionality of the data while losing as little information as possible. The use of principal component analysis to study mortality is not new and has been used with parameter estimation proposals in mortality forecasting (Lee & Carter, 1992; Booth et al., 2002; Renshaw & Haberman, 2006; Hyndman & Ullah, 2007). Functional principal component analysis (FPCA) is the extension of the more classical multivariate PCA to functional data. In our work, we use FPCA for clustering purposes but also for data projection and the interpretation of the curves.
As in the multivariate case, FPCA provides a way of looking at covariance structure that can be much more informative and can complement a direct examination of the variance-covariance function. The values of the variables in PCA are replaced by function values in FPCA and the discrete index by the continuous index t. Given n functional observations with and as the estimate of the mean function, the estimated covariance function, analogous with the covariance matrix in the multivariate case, is defined as:
The spectral decomposition performs the task of finding the most important modes of variation in the covariance or correlation matrix of the curves. It provides a countable set of positive eigenvalues associated with a basis expansion of orthonormal basis functions with such that
In standard terminology, the basis functions are the eigenfunctions or harmonics; they define the most important modes of variation in the curves and are orthogonal of each other. The eigenvalues measure the variability in the directions corresponding to the eigenfunctions.
The projection of in the direction of the eigenfunctions provides us with the functional principal components, a set of zero-mean linearly uncorrelated random variables, defined on the same interval of the functional data, with variance . As and are functions, summations of variables in the multivariate context are replaced by integrations over t to define an inner product. Thus, the principal component scores of the ith curve are defined as
The decomposition of Karhunen-Loève allows the expression of the curve through its functional principal component expansion
Therefore, the FPCA provides us with a group of basis functions and returns functional data as a linear combination of the new basis functions, where the coefficient of the is the estimated score of the l-th principal component of the corresponding curve. The decomposition of Karhunen-Loève facilitates the dimension reduction in that if the first q terms (for a large enough q) provide a good approximation to the infinite sum, the information contained in the curve is essentially synthesized by the q-dimensional vector , and one can work with this approximation.
FPCA is useful for the dimension reduction of the curves in all the clustering approaches applied to the low-mortality countries. In addition, the eigenfunctions allow the identification of the main directions of variability in the complete mortality profile with respect to the mean curve, and the corresponding scores for every curve can be used to characterize the countries in the clusters in a reduced dimensional space.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.