Calibrating SVD-Comp to the Relationship Between 5q0 and Mortality at Other Ages in the HMD

SC Samuel J. Clark
ask Ask a question
Favorite

All computation is carried out using the R statistical programming environment (R Foundation for Statistical Computing 2016).

The life tables of the HMD are arranged into two A × L matrices (Qz) of single-year, age-specific life table probabilities of dying (1qx), one for each sex. A = number of age groups = 110 L = number of life tables = 4,610; and z ∈ {female, male}. The SVD5 of each Qz yields ρ LSVs, uzi; RSVs, vzi; and SVs, sz. To ensure that all age groups have approximately the same influence when calculating the SVDs, each mortality schedule is offset from the origin6 by −10, and the offset is added back to predicted mortality schedules. Four of the new dimensions identified by each SVD are retained—that is, c = 4 in Eq. (11). For females, those account for 0.998328, 0.000936, 0.000071, and 0.000058 of the total sum of squares, respectively, or together 0.999392. Corresponding figures for males are 0.998595, 0.000824, 0.000103, and 0.000052, and together 0.999575. Section C of the online appendix contains additional information on the total sum of squares explained by each component of the SVD.

Based on Eqs. (9) and (10), regression models are defined that relate the RSVs vzi to 5q0z and 45q15z. Scatterplots of the elements of the RSVs versus logit(5q0) in Figs. E1 and E2 in the online appendix make it clear that the relationships are not linear or simple. With no theory to guide the choice of predictors, I tried all combinations of simple transformations of logit(5q0) and logit(45q15) and their interactions. The resulting models explain almost all the variance in the elements of v1 R2 ≈ 97% for both sexes for both sexes), the vast majority of the variance in the elements of v2 (R2 ≈ 87 % for both sexes), and one-third to one-half the variance in the elements of v3 and v4. Additionally, I tried to avoid overfitting or creating odd boundary effects in the predicted values that would have made out-of-sample predictions immediately implausible. These models behave sensibly up to the edges of the sample. The final models are

where i ∈ {1 : 4} indexes the SVD dimensions, and indexes mortality schedules and elements of vzi. OLS regression is used to estimate coefficients for the eight regression models defined in Eq. (12), and the estimated values are contained in online appendix D, Tables D1 and D2. With new values for both 5q0 and 45q15 as inputs, these models are used to predict values for the weights in Eq. (11)—that is, for prediction, vzℓi on the left-hand side is replaced with W^zi.

To accommodate a one-parameter model that uses only 5q0 as an input, I define a regression model that relates adult mortality logit(45q15)z to child mortality 5q0Z. The scatterplot of logit(45q15) versus logit(5q0) in Fig. E3 in the online appendix reveals a slightly complicated relationship that is neither linear nor systematically curvilinear. Again, without theory as a guide, I tried a variety of models, including various simple transformations of 5q0. The resulting models explain most of the variance in logit(45q15) (R2 = 93 % for females, and 79 % for males). The final models are

OLS regression is used to estimate coefficients for the two regression models defined by Eq. (13), and the estimated coefficients are contained in Table D3 in the online appendix. This model is used to predict values for 45q15 when only 5q0 is supplied as an input. Then both the input value for 5q0 and the predicted value for 45q15 are used in Eq. (12) to predict the weights in Eq. (11).

Figure E4 in the online appendix displays the relationship between logit(1q0) and logit(5q0). Mortality falls very rapidly in the first few years of life. Using the child mortality rate (5q0), a five-year summary of mortality between ages 0 and 5, as a predictor of single-year mortality within that same five-year age group is relatively uninformative. Experimentation reveals that 5q0 predicts 1q1 through 1q4 well and 1q0 slightly less well. The prediction of 1q0 can be improved by modeling the relationship between logit(1q0) and logit(5q0) separately as

OLS regression is used to estimate the coefficients of this model, displayed in Table D4 of the online appendix. The model explains essentially all the variance in logit(1q0) (R2 > 99 % for both sexes) and is used to predict values for 1q0 directly from the input value of 5q0.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A