We now describe the likelihood contribution of individuals pooled at a single node. Consider a matched case-control sets at any arbitrary node following NCC subcohort sampling, along with measurements of a single exposure and confounder. Denote the outcome event indicators of the -th matched set by and the random variable version by , the observed survival times by with for all representing the event time of the matched set, the exposure variable by , and the confounder by for . The time matched samples are now randomly aggregated based on event status. For an arbitrary pool size and matched sets, the pooled data generated is comprised of matched samples of aggregate covariates and outcome events. The likelihood contribution of the pooled matched set is expressed as
where represents the minimum observed time of the cases or controls constituting that pool set, , and for all .
represents the set:
Assuming a PH model for all , the probabilities give
For all ,
where denotes the baseline hazard specific to the -th observation of the matched subcohort for and represent the survival terms common to the case and control matched sets. We can thus rewrite the likelihood contribution to the pooled matched set as
The product of the likelihood contribution over all the pooled matched sets gives the expression
which results in the same likelihood form as the NCC subcohort likelihood in equation (3) with the same regression parameters. Thus, predictions and inference could be conducted using the pooled data instead of individual level data. The consistency of the pooled logistic likelihood in estimating the parameters of individual level likelihood has been shown by Saha-Chaudhuriet al. 36 Our derivation closely follows the conditional logistic likelihood derivation of Clayton and Hills 37 and Langholz and Goldstein 30 and Langholz and Clayton. 38
Utilizing the well-established equivalence between likelihood of the NCC subcohort and the likelihood of conditional logistic regression,39,36 inference on the MLEs could be carried out by using readily available packages for conditional logistic regression or stratified Cox regression. 34 The estimated parameters are interpreted as log HRs rather than the traditional log odds ratios derived from conditional logistic likelihood. Moreover, standard inference techniques applicable to conditional logistic likelihood can be employed to estimate the SEs of the pooled subcohort estimators. Of note, the units of analysis for the pooled NCC subcohort are the pools themselves, as opposed to individual measurements.
As mentioned earlier, the AC receives pooled NCC subcohorts (see Table 1) from each contributing node. This includes partial information on the observed event times of the pooled cases, the number of individuals making up each riskset at the node level if it is more than five individuals, the pool event status (1 if cases are pooled and 0 if controls are pooled), and their corresponding aggregate covariate values. While the log HRs associated with the covariates could be estimated using the pooled subcohorts without any need for the matched event times; to estimate the overall survival curves, the individual event times of subjects making up the pools would need to be recovered.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.