Since we were sampling via multi-stage stratified cluster random sampling, we needed to calculate design effect to account for the increased variance expected with cluster random sampling as opposed to simple random sampling. We presumed a mean unadjusted prevalence of sero-positivity for anti-SARS-CoV-2 antibody to be 5%, with a standard deviation of 1%. The inter-cluster variation (also called intra-class cluster coefficient, ICC or ρ) was determined to be 0.20. We decided to sample 50 participants per cluster. Accordingly, we calculated the design effect based on the formula:
Where:
DEFF = Design Effect
ppC = Persons per cluster (here 50)
ρ = Intra-class cluster coefficient (here 0.2)
As seen in Table 9, for multi-stage stratified cluster sampling with size of each cluster taken to be fifty (50), the DEFF was derived to be 10.8.
Calculation of design effect.
Base sample size for the study was estimated using the formula
Where:
Z1-α/2 = is the standard normal variate; at a 5% standard error (i.e. p-value of 0.05), it was estimated to be 1.96
p = prevalence of the health condition, here positive test for SARS-CoV-2, assumed at 5%
d = absolute precision, here taken to be 2%
Using this formula, we arrived at a base sample size of 457.
Since the design effect was calculated to be 10.8, the corrected sample size for the purpose of our study was calculated by multiplying the base sample size with the design effect.
The corrected sample size (adjusted for clustering) was calculated using the formula:
Where:
N = Corrected sample size
n = Base sample size (here 457)
DEFF = Design Effect (here 10.8)
The corrected sample size therefore was calculated to be 4936, rounded off to 5000.
Since we had decided to collect a sample of 50 participants per cluster, it thus was surmised that we would need to collect data from 5000/50, i.e. 100 clusters.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.