Genotyping and quality control

OO Oyesola O Ojewunmi
TA Titilope A Adeyemo
AO Ajoke I Oyetunji
BI Bassey Inyang
AA Afolashade Akinrindoye
BM Baraka S Mkumbe
KG Kate Gardner
HR Helen Rooks
JB John Brewin
HP Hamel Patel
SL Sang Hyuck Lee
RC Raymond Chung
SR Sara Rashkin
GK Guolian Kang
RC Reuben Chianumba
RS Raphael Sangeda
LM Liberata Mwita
HI Hezekiah Isa
UA Uche-Nnebe Agumadu
RE Rosemary Ekong
JF Jamilu A Faruk
BJ Bello Y Jamoh
NA Niyi M Adebiyi
IU Ismail A Umar
AH Abdulaziz Hassan
CG Christopher Grace
AG Anuj Goel
BI Baba P D Inusa
MF Mario Falchi
SN Siana Nkya
JM Julie Makani
HA Hafsat R Ahmad
ON Obiageli Nnodu
JS John Strouboulis
SM Stephan Menzel
request Request a Protocol
ask Ask a question
Favorite

Genotyping was performed using the Infinium™ H3Africa Consortium Array containing ~ 2.3 million markers. One thousand one hundred and fifty-eight samples were genotyped and processed in Illumina’s Genome Studio software (version 2.05) for variant calling following the COPILOT raw Illumina genotyping quality control (QC) protocols detailed in [38]. Seventy-seven samples with a genotyping call rate of less than 90% were excluded during the Illumina Genome Studio QC, and no sample was excluded further due to sample quality, as the genotyping call rate was 99.99%. Individual-level QC was carried out to exclude samples with sex discrepancies compared with X-chromosome-derived sex, heterozygosity outliers (heterozygosity ±3 SD from the mean), and genetically identical individuals (identity by descent, pi-hat ~ 1.0) (Supplementary Table 2), retaining 1006 individuals for imputation and downstream analysis. Per-marker QC excluded SNPs with call rate less than 97%, minor allele frequency < 1%, and SNPs that deviated from Hardy–Weinberg equilibrium (P < 10−8), leaving 1 925 391 autosomal SNPs and X-chromosomes. The overall genotyping rate was 99.99%. Quality control was carried out using PLINK v1.90 (www.cog-genomics.org/plink/1.9/).

To construct the Principal Component Analysis (PCA) of genotypes, we integrated our quality-controlled study dataset with the 1000 Genome reference Phase 3 version 5 [39] after extracting the overlapping markers and excluding the multi-allelic SNPs. The combined data was further filtered (genotype frequency less than 99% and minor allele frequency less than 5%) and pruned (—indep-pairwise 1500 150 0.2) while excluding regions of high linkage disequilibrium before generating the principal components in Plink 2.0 [40]. PCA, inclusive of our dataset, was performed twice: (i) with global populations (CEU: Utah residents with Northern and Western European ancestry for European ancestry, CHB: Han Chinese in Beijing, China, and JPT: Japanese in Tokyo, Japan representing East Asian ancestry, and YRI: Yoruba in Ibadan for African ancestry) and (ii) with a focus on the African continental populations consisting of ESN: Esan in Nigeria; GWD: Gambian in Western Division; LWK: Luhya in Webuye, Kenya; MSL: Mende in Sierra Leone; YRI: Yoruba in Ibadan, Nigeria. Within the continental African PCA plot, our study samples were classified into NG-S (participants enrolled from the South-west Nigeria recruitment site: Lagos) and NG-N (participants enrolled from the North-central (Abuja) and North-west (Zaria) recruitment sites in Nigeria). PCA plots were created in R v.4.2.2.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A