Functional Principal Component Analysis

Ainhoa-Elena Léger; Stefano Mazzuco

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Functional Principal Component Analysis

AL Ainhoa-Elena Léger

SM Stefano Mazzuco

This method is extracted from research article: Eur J Popul, Jun 2021

What Can We Learn from the Functional Clustering of Mortality Data? An Application to the Human Mortality Database

DOI: 10.1007/s10680-021-09588-y

Ask a question

Favorite

The principal component analysis is a statistical procedure mostly used for reducing the dimensionality of the data while losing as little information as possible. The use of principal component analysis to study mortality is not new and has been used with parameter estimation proposals in mortality forecasting (Lee & Carter, 1992; Booth et al., 2002; Renshaw & Haberman, 2006; Hyndman & Ullah, 2007). Functional principal component analysis (FPCA) is the extension of the more classical multivariate PCA to functional data. In our work, we use FPCA for clustering purposes but also for data projection and the interpretation of the curves.

As in the multivariate case, FPCA provides a way of looking at covariance structure that can be much more informative and can complement a direct examination of the variance-covariance function. The values of the variables in PCA are replaced by function values $x_{i} (t)$ in FPCA and the discrete index by the continuous index t. Given n functional observations $x_{i} (t)$ with $1 \leq i \leq n$ and $\bar{x} (t)$ as the estimate of the mean function, the estimated covariance function, analogous with the covariance matrix in the multivariate case, is defined as:

The spectral decomposition performs the task of finding the most important modes of variation in the covariance or correlation matrix of the curves. It provides a countable set of positive eigenvalues $λ_{1} \geq λ_{2} \geq \dots$ associated with a basis expansion of orthonormal basis functions $ϕ_{l} (t)$ with $l = 1, \dots$ such that

In standard terminology, the basis functions $ϕ_{l} (t)$ are the eigenfunctions or harmonics; they define the most important modes of variation in the curves and are orthogonal of each other. The eigenvalues measure the variability in the directions corresponding to the eigenfunctions.

The projection of $x_{i} (t)$ in the direction of the eigenfunctions $ϕ_{l} (t)$ provides us with the functional principal components, a set of zero-mean linearly uncorrelated random variables, defined on the same interval of the functional data, with variance $λ_{l}$ . As $x_{i} (t)$ and $ϕ_{l} (t)$ are functions, summations of variables in the multivariate context are replaced by integrations over t to define an inner product. Thus, the principal component scores of the ith curve are defined as

The decomposition of Karhunen-Loève allows the expression of the curve through its functional principal component expansion

Therefore, the FPCA provides us with a group of basis functions $ϕ_{1} (t), \dots, ϕ_{l} (t)$ and returns functional data as a linear combination of the new basis functions, where the coefficient of the $ϕ_{l} (t)$ is the estimated score of the l-th principal component of the corresponding curve. The decomposition of Karhunen-Loève facilitates the dimension reduction in that if the first q terms (for a large enough q) provide a good approximation to the infinite sum, the information contained in the curve $x_{i} (t)$ is essentially synthesized by the q-dimensional vector $c = (c_{i 1}, \dots, c_{iq})$ , and one can work with this approximation.

FPCA is useful for the dimension reduction of the curves in all the clustering approaches applied to the low-mortality countries. In addition, the eigenfunctions allow the identification of the main directions of variability in the complete mortality profile with respect to the mean curve, and the corresponding scores for every curve can be used to characterize the countries in the clusters in a reduced dimensional space.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol