Simulated case-control dataset

MZ Maria Zanti
DO Denise G. O'Mahony
MP Michael T. Parsons
HL Hongyan Li
JD Joe Dennis
KA Kristiina Aittomäkkiki
IA Irene L. Andrulis
HA Hoda Anton-Culver
KA Kristan J. Aronson
AA Annelie Augustinsson
HB Heiko Becher
SB Stig E. Bojesen
MB Manjeet K. Bolla
HB Hermann Brenner
MB Melissa A. Brown
SB Saundra S. Buys
FC Federico Canzian
SC Sandrine M. Caputo
JC Jose E. Castelao
JC Jenny Chang-Claude
KC Kamila Czene
MD Mary B. Daly
AN Arcangela De Nicolo
PD Peter Devilee
TD Thilo Dörk
AD Alison M. Dunning
MD Miriam Dwek
DE Diana M. Eccles
CE Christoph Engel
DE D. Gareth Evans
PF Peter A. Fasching
MG Manuela Gago-Dominguez
MG Montserrat García-Closas
JG José A. García-Sáenz
AG Aleksandra Gentry-Maharaj
WG Willemina R.R. Geurts - Giele
GG Graham G. Giles
GG Gord Glendon
MG Mark S. Goldberg
EG Encarna B. Gómez Garcia
MG Melanie Güendert
PG Pascal Guénel
EH Eric Hahnen
CH Christopher A. Haiman
PH Per Hall
UH Ute Hamann
EH Elaine F. Harkness
FH Frans B.L. Hogervorst
AH Antoinette Hollestelle
RH Reiner Hoppe
ask Ask a question
Favorite

Genotype data simulations were performed using the R (v3.6.1) (https://www.r-project.org/) statistical computing language. To create case-control datasets, genotypes for cases and controls were simulated using a Poisson distribution with lambda (λ) equal to the mean number of events (variant carriers) in the given interval, expressed as:

where N denotes the sample size, RR denotes the relative breast cancer risk of the causal variant and MAF denotes the minor allele frequency of the variant in the general population. Ages were simulated using normal distribution, with mean and standard deviation following the gene-specific age distribution in the CARRIERS population-based study (Hu et al., 2021).

Genotype data simulations were carried out for variants conferring a RR of 1 (indicating no increased risk), 2, 3, 4, 5, 6, 7, 8, 9 or 10, minor allele frequency in controls of 0.0001, 0.00005 or 0.00003 and sample size of N = 20,000 (20,000 breast cancer cases and 20,000 controls), 30,000 (30,000 breast cancer cases and 30,000 controls) or 50,000 (50,000 breast cancer cases and 50,000 controls). For each of these 90 scenarios, we simulated 10,000 replicates.

Additionally, in order to account for the possibility that age information is not available, we repeated the analysis using same age for all individuals.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A