Genotype data simulations were performed using the R (v3.6.1) (https://www.r-project.org/) statistical computing language. To create case-control datasets, genotypes for cases and controls were simulated using a Poisson distribution with lambda equal to the mean number of events (variant carriers) in the given interval, expressed as:
where denotes the sample size, denotes the relative breast cancer risk of the causal variant and denotes the minor allele frequency of the variant in the general population. Ages were simulated using normal distribution, with mean and standard deviation following the gene-specific age distribution in the CARRIERS population-based study (Hu et al., 2021).
Genotype data simulations were carried out for variants conferring a RR of 1 (indicating no increased risk), 2, 3, 4, 5, 6, 7, 8, 9 or 10, minor allele frequency in controls of 0.0001, 0.00005 or 0.00003 and sample size of N = 20,000 (20,000 breast cancer cases and 20,000 controls), 30,000 (30,000 breast cancer cases and 30,000 controls) or 50,000 (50,000 breast cancer cases and 50,000 controls). For each of these 90 scenarios, we simulated 10,000 replicates.
Additionally, in order to account for the possibility that age information is not available, we repeated the analysis using same age for all individuals.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.