HR estimation with pooled NCC likelihood

LJ Lamin Juwara
YY Yi Archer Yang
AV Ana M Velly
PS Paramita Saha-Chaudhuri
ask Ask a question
Favorite

We now describe the likelihood contribution of individuals pooled at a single node. Consider a 1:m matched case-control sets at any arbitrary node following NCC subcohort sampling, along with measurements of a single exposure and confounder. Denote the outcome event indicators of the i -th matched set by (δi1,δi2,,δim+1) and the random variable version by (Di1,Di2,,Dim+1) , the observed survival times by (Yi1,Yi2,,Yim+1) with Yij=Yi for all j representing the event time of the matched set, the exposure variable by (Zi(e)1,Zi(e)2,,Zi(e)m+1) , and the confounder by (Zi(c)1,Zi(c)2,,Zi(c)m+1) for i=1,2,,n . The time matched samples are now randomly aggregated based on event status. For an arbitrary pool size κ and n matched sets, the pooled data generated is comprised of n/κ matched samples of aggregate covariates and outcome events. The likelihood contribution of the pooled matched set i{1,2,,n/κ} is expressed as

where Yij represents the minimum observed time of the cases or controls constituting that pool set, Zi(e)j=κ1i=1κZi(e)j , and Zi(c)j=κ1i=1κZi(c)j for all j(j=1,,m+1) .

D represents the set:

Assuming a PH model for all j:δij=0 , the probabilities give

For all j:δij=1 ,

where λ0i(Yij) denotes the baseline hazard specific to the i -th observation of the matched subcohort for j:δij=1 and Qij(Yi)=exp(exp(β1Zi(e)j+β2Zi(c)j)0Yijλ0i(τ)dτ) represent the survival terms common to the case and control matched sets. We can thus rewrite the likelihood contribution to the pooled matched set as

The product of the likelihood contribution Pr(Di1=1|) over all the pooled matched sets i{1,2,,n/κ} gives the expression

which results in the same likelihood form as the NCC subcohort likelihood in equation (3) with the same regression parameters. Thus, predictions and inference could be conducted using the pooled data instead of individual level data. The consistency of the pooled logistic likelihood in estimating the parameters of individual level likelihood has been shown by Saha-Chaudhuriet al. 36 Our derivation closely follows the conditional logistic likelihood derivation of Clayton and Hills 37 and Langholz and Goldstein 30 and Langholz and Clayton. 38

Utilizing the well-established equivalence between likelihood of the NCC subcohort and the likelihood of conditional logistic regression,39,36 inference on the MLEs could be carried out by using readily available packages for conditional logistic regression or stratified Cox regression. 34 The estimated parameters are interpreted as log HRs rather than the traditional log odds ratios derived from conditional logistic likelihood. Moreover, standard inference techniques applicable to conditional logistic likelihood can be employed to estimate the SEs of the pooled subcohort estimators. Of note, the units of analysis for the pooled NCC subcohort are the pools themselves, as opposed to individual measurements.

As mentioned earlier, the AC receives pooled NCC subcohorts (see Table 1) from each contributing node. This includes partial information on the observed event times of the pooled cases, the number of individuals making up each riskset at the node level if it is more than five individuals, the pool event status (1 if cases are pooled and 0 if controls are pooled), and their corresponding aggregate covariate values. While the log HRs associated with the covariates could be estimated using the pooled subcohorts without any need for the matched event times; to estimate the overall survival curves, the individual event times of subjects making up the pools would need to be recovered.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A