We simulated data on the incidence of hypertension, but results are applicable to any chronic illness with a long latency period and for which, relative to air pollution, there is a long induction period. To simulate the data, two assumptions must be made: (1) the baseline characteristics that predict hypertension incidence in the population, and (2) the true nature of the relationship between PM2.5 and hypertension incidence. For assumption 1, we have assumed that the incidence of hypertension is related to age, BMI and SES at the start of follow-up (1999), as is the case in BWHS24,25. We assumed the event times follow a Gompertz distribution with scale parameter of 0.0002 and shape parameter of 0.49 to most accurately reflect the observed time to event.
For assumption 2, we assumed that the incidence of hypertension is positively related to PM2.5, and that hypertension incidence over time reflects change in PM2.5 levels through time. In BWHS data, PM2.5 at most locations decreases in an approximately linear fashion through time (Fig. 1). We simulated survival times assuming two different relationships between PM2.5 and the time to event. In both scenarios we simulate hypertension incidence assuming a linear change in PM2.5 through time, using the method described by Austin (see appendix)26. Thus each woman in the simulation has an individually estimated linear change in PM2.5, and her hazard of developing hypertension at any time is a function of the hazard distribution, her baseline BMI, SES, age, and the linearly changing PM2.5 at her address. In the first scenario we fit a linear regression to each woman’s PM2.5 values through time and associated the best linear fit of PM2.5 with outcomes one year later. Therefore, an outcome in year t is best predicted by the value of PM2.5 one year prior. In the second scenario, we assumed that the cumulative exposure to PM2.5, from start of follow up to diagnosis or end of follow-up, has the most influence on disease occurrence. In this case we fit a linear regression model to the time-varying cumulative average of PM2.5 for each woman and used this linear assumption of exposure to simulate survival times for each woman.
The black line is the average PM2.5 value and the shaded area shows the range of values observed for the entire cohort.
In both scenarios, those whose simulated survival time preceded the end of follow-up (2008) were randomly assigned an indicator of whether the individual was a case or censored using a uniform distribution with a probability of being a case set to 0.6, reflecting the proportion observed year to year in BWHS among women whose time to follow-up was less than the study period. In other words, those with simulated follow-up less than 10 years have a 60% chance of being a case and a 40% chance of being censored. This rate is fairly consistent through time in BWHS data.
For each simulated dataset, we postulated three hazard ratios (HR) describing the association of PM2.5 with hypertension: 1.0, 1.2, or 1.5, for a total of 6 simulation scenarios (three HRs and two sets of simulated data). We simulated 1000 datasets for each scenario.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.