Simulating COVID-19 deaths involved 3 steps, with the last 2 repeated in each simulation run. This was necessary given the absence of cross-tabulations of COVID-19 deaths by age, sex, and comorbidity, with only univariate distributions consistently available. In the first step, marginal distributions of age, sex, and comorbidity were taken from multiple public health agencies and used to estimate prior distributions. In the second step, the marginal distributions of age, sex, and comorbidity were randomly drawn from the priors. Using the correlations among these variables from NHANES, we then approximated their joint distribution. In the third step, the joint distribution was used to reweight the NHANES sample to represent COVID-19 deaths. We give more detail on each step in the following.
As joint distributions of characteristics of COVID-19 deaths were unavailable, we first obtained their marginal distributions. We considered the marginal distributions of deaths by age, sex, the absence of comorbidity, and presence of each of hypertension, diabetes, CKD, IHD, COPD, and cancer. These were obtained from the CDC, the United Kingdom’s Office for National Statistics, Santé Publique France, Istituto Superiore di Sanità in Italy, Instituto de Salud Carlos III in Spain, and the China Center for Disease Control and Prevention [35–41]; S1 Text provides further details. Comorbidity data from the CDC, the Office for National Statistics, and Santé Publique France were excluded from fitting the model due to their reliance on death certificate data, which underestimate the prevalence of comorbidities compared with reported data from US hospitals and from systematically collected data from Italy, Spain, and China [42,43]. We then fit maximum-likelihood beta or Dirichlet (for age) priors to each marginal distribution. To estimate a joint distribution, race was set as an indicator variable for non-Hispanic white and fixed to the proportion reported by the CDC, as this was the only agency that reported race/ethnicity.
The second step was to approximate the joint distributions of the aforementioned characteristics. To do this, we used marginal distributions for age, sex, and each of the comorbidities drawn from their respective priors, the fixed marginal distribution of non-Hispanic white individuals, and the correlations of these variables from NHANES to estimate a joint distribution using a Gaussian copula [44]. We then assigned the joint probabilities as weights to each NHANES participant such that the participants would, in total, represent 200,000 deaths. We repeated step 2 1,000 times. We checked calibration against CDC-published distributions of deaths by age, by sex, and by race/ethnicity. We report our results and data in terms of gender, which is reported in NHANES. However, data from public health agencies are given in terms of sex.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.