We examined genetic and environmental covariance estimation accuracy in the presence of multiple genetic covariance and environmental covariance. To do so, we divided SNPs into two categories following [20]: one functional category (132,557 SNPs) that includes all SNPs inside coding, UTR, promoter, exon or intron regions; and another non-functional category (157,437 SNPs) that include the remaining SNPs. We set the heritability of both traits to be 0.5 and evenly divided the heritability explained by each category of SNPs to be 0.25. For genetic covariance estimation, we set the genetic covariance of the non-functional annotation regions to be zero and varied the genetic covariance of the functional annotation regions from -0.2 to 0.2 by 0.025 following [11]. For environmental covariance estimation, we set the heritability of both traits to be 0.5 and evenly divided the heritability explained by each category of SNPs to be 0.25. In addition, we set the genetic covariance for the functional annotation region to be 0.05, set the genetic covariance for the non-functional annotation region to be 0, and varied the environmental covariance from -0.4 to 0.4 by a step of 0.05. A total of 85 scenarios (3 study designs x 1 genetic covariance x 17 environmental covariance values + 2 study designs x 1 environmental covariance x 17 genetic covariance values) were examined in setting IV.
Besides the above four main simulation settings, we also considered three additional simulation settings:
The dense genotype setting where we used 1,819,851 imputed WTCCC SNPs based on the 1,000 Genome phase three reference panel for simulations. We examined all 85 scenarios in the simulation setting III. The moderate heritability setting where we used a heritability of 0.15 instead of 0.5 and examined all 85 scenarios in simulation setting III. Note that the heritability of 0.15 is close to the mean heritability estimate in our real data application (mean = 0.155). The mismatched LD setting where the LD score is not computed from the data at hand but from a separate reference panel. Specifically, we used LD score computed from 503 individuals with European ancestry from the 1,000 Genomes project reference panel instead of that computed based on WTCCC. We considered mismatched LD setting in all 85 scenarios in setting III. The overlap sample number misspecification setting where we examined the two partially overlapping study design in simulation setting III and set ns incorrectly to be 250, 500, 1,250, or 1,500 (while the true number is 1,000). We examined a total of 136 scenarios (4 ns choices x 1 genetic covariance x 17 environmental covariance values + 4 ns choices x 1 environmental covariance x 17 genetic covariance values) in this setting.
For each scenario in the simulation settings III and IV, we performed 1,000 simulation replicates. For each scenario in the remaining six simulation settings, we performed 100 simulation replicates. We calculated type I error and power based on these replicates to check the performance of GECKO under different sample compositions. Because different methods have different control of type I error, we compared the power of different methods at a fixed type I error rate instead of a nominal p-value threshold. Specifically, we ranked p-values from different methods under the null, obtained for each method its p-value threshold that corresponds to a 5% type I error rate, and used this p-value threshold for the given method as the cutoff to calculate its power. Therefore, a different p-value threshold is used for each different method, ensuring fair power comparison at a fixed type I error rate.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.