To investigate the associations between genetically predicted circulating protein levels and PDAC risk, the validated protein genetic prediction models were applied to the summary statistics from a large genome-wide association study (GWAS) of PDAC risk. In the present work, we used data from a GWAS conducted in the PanScan and PanC4 consortia downloaded from the database of Genotypes and Phenotypes (dbGaP), including 8,275 PDAC cases and 6,723 controls of European ancestry. Detailed information on this dataset has been included elsewhere [17, 20, 32]. Briefly, 4 GWASs (PanScan I, PanScan II, PanScan III, and PanC4) were genotyped using the Illumina HumanHap550,610-Quad, OmniExpress, and OmniExpressExome arrays, respectively. Standard QC procedures were performed according to the consortia guidelines [32]. Study participants who were related to each other, had sex discordance, had genetic ancestry other than Europeans, had a low call rate (less than 98% and 94% in PanC4 and PanScan, respectively), or had missing information on age or sex were excluded. Duplicated SNPs and those with a high missing call rate (at least 2% and 6% in PanC4 and PanScan, respectively) or with violations of HWE (P < 1 × 10−4 and P < 1 × 10−7 in PanC4 and PanScan, respectively) were also removed. Regarding SNP data from PanC4, those with minor allele frequency <0.005, with more than 2 discordant calls in duplicate samples, with more than 1 Mendelian error in HapMap control trios, and with a sex difference in allele frequency >0.2 or in heterozygosity >0.3 for autosomes/XY in European descendants were further removed. We performed genotype imputation using Minimac3 after prephasing with SHAPEIT from a reference panel of the Haplotype Reference Consortium (r1.1 2016) [33, 34]. We retained imputed SNPs with an imputation quality of ≥0.3. The associations between individual genetic variants and PDAC risk were further estimated adjusting for age, sex, and top principal components. The TWAS/FUSION framework was used to assess the protein–PDAC risk associations by leveraging correlations between variants included in the prediction models based on the phase III 1000 Genomes Project data for European populations [15]. We calculated the PWAS test statistic z-score = w“Z/(w”Σs,sw)1/2, where the Z is a vector of standardized effect sizes of SNPs for a given protein (Wald z-scores), w is a vector of prediction weights for the abundance feature of the protein being tested, and the Σs,s is the linkage disequilibrium (LD) matrix of the SNPs estimated from the 1000 Genomes Project as the LD reference panel. We used the FDR-corrected P value threshold of ≤0.05 to determine significant associations between genetically predicted protein concentrations and risk of PDAC.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.