Imputation of the UK Biobank to the TOPMed panel and association analyses

Daniel Taliun; Daniel N. Harris; Michael D. Kessler; Jedidiah Carlson; Zachary A. Szpiech; Raul Torres; Sarah A. Gagliano Taliun; André Corvelo; Stephanie M. Gogarten; Hyun Min Kang; Achilleas N. Pitsillides; Jonathon LeFaive; Seung-been Lee; Xiaowen Tian; Brian L. Browning; Sayantan Das; Anne-Katrin Emde; Wayne E. Clarke; Douglas P. Loesch; Amol C. Shetty; Thomas W. Blackwell; Albert V. Smith; Quenna Wong; Xiaoming Liu; Matthew P. Conomos; Dean M. Bobo; François Aguet; Christine Albert; Alvaro Alonso; Kristin G. Ardlie; Dan E. Arking; Stella Aslibekyan; Paul L. Auer; John Barnard; R. Graham Barr; Lucas Barwick; Lewis C. Becker; Rebecca L. Beer; Emelia J. Benjamin; Lawrence F. Bielak; John Blangero; Michael Boehnke; Donald W. Bowden; Jennifer A. Brody; Esteban G. Burchard; Brian E. Cade; James F. Casella; Brandon Chalazan; Daniel I. Chasman; Yii-Der Ida Chen

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Imputation of the UK Biobank to the TOPMed panel and association analyses

DT Daniel Taliun

DH Daniel N. Harris

MK Michael D. Kessler

JC Jedidiah Carlson

ZS Zachary A. Szpiech

RT Raul Torres

ST Sarah A. Gagliano Taliun

AC André Corvelo

SG Stephanie M. Gogarten

HK Hyun Min Kang

AP Achilleas N. Pitsillides

JL Jonathon LeFaive

SL Seung-been Lee

XT Xiaowen Tian

BB Brian L. Browning

SD Sayantan Das

AE Anne-Katrin Emde

WC Wayne E. Clarke

DL Douglas P. Loesch

AS Amol C. Shetty

TB Thomas W. Blackwell

AS Albert V. Smith

QW Quenna Wong

XL Xiaoming Liu

MC Matthew P. Conomos

DB Dean M. Bobo

FA François Aguet

CA Christine Albert

AA Alvaro Alonso

KA Kristin G. Ardlie

DA Dan E. Arking

SA Stella Aslibekyan

PA Paul L. Auer

JB John Barnard

RB R. Graham Barr

LB Lucas Barwick

LB Lewis C. Becker

RB Rebecca L. Beer

EB Emelia J. Benjamin

LB Lawrence F. Bielak

JB John Blangero

MB Michael Boehnke

DB Donald W. Bowden

JB Jennifer A. Brody

EB Esteban G. Burchard

BC Brian E. Cade

JC James F. Casella

BC Brandon Chalazan

DC Daniel I. Chasman

YC Yii-Der Ida Chen

This method is extracted from research article: Nature, Feb 2021

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

DOI: 10.1038/s41586-021-03205-y

Request a Protocol

Ask a question

Favorite

After phasing the UK Biobank genetic data (carried out on 81 chromosomal chunks using Eagle v.2.4), the phased data were converted from GRCh37 to GRCh38 using LiftOver¹¹². Imputation was performed using Minimac4¹¹¹.

We compared the correlation of genotypes between the exome-sequencing data released by the UK Biobank (following their SPB pipeline¹¹³) and the TOPMed-imputed genotypes. The comparison assessed 49,819 individuals and 3,052,260 autosomal variants that were found in both the exome-sequencing and TOPMed-imputed datasets (matched by chromosome, position and alleles, and with an imputation quality of at least 0.3 in the TOPMed-imputed data). We split the variants into MAF bins for which the MAF from the exome data was used to define the bins, and computed Pearson correlations averaged within each bin.

We tested single pLOF, nonsense, frameshift and essential splice-site variants^85,86 for association with 1,419 PheCodes constructed from composites of ICD-10 (International Classification of Diseases 10th revision) codes to define cases and controls. Construction of the PheCodes has been previously described¹¹⁴. We performed the association analysis in the ‘white British’ individuals, which resulted in 408,008 individuals after the following quality control metrics were applied: (1) samples did not withdraw consent from the UK Biobank study as of the end of 2019; (2) ‘submitted gender’ matches ‘inferred sex’; (3) phased autosomal data available; (4) outliers for the number of missing genotypes or heterozygosity removed; (5) no putative sex chromosome aneuploidy; (6) no excess of relatives; (7) not excluded from kinship inference; and (8) in the UK Biobank defined the ‘white British’ ancestry subset. To perform the association analyses, we used a logistic mixed model test implemented in SAIGE¹¹⁴ with birth year and the top four principal components (computed from the white British subset) as covariates. For the pLOF burden tests, for each autosomal gene with at least two rare pLOF variants (n = 12,052 genes), a burden variable was created in which dosages of rare pLOF variants were summed for each individual. This sum of dosages was tested for association with the 1,419 traits using SAIGE. The same covariates used in the single-variant tests were included. For both the single-variant and the burden tests, we used 5 × 10⁻⁸ as the genome-wide significance threshold.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol