Genotyping and data preprocessing

KM Katsuhiko Mineta
KG Kosuke Goto
TG Takashi Gojobori
FA Fowzan S. Alkuraya
request Request a Protocol
ask Ask a question
Favorite

532,615 autosomal SNPs were genotyped for a sample of 1,073 individuals using the Affymetrix Axiom genotyping assay (Axiom Genome-wide CEU 1 Array Plate, AxiomGWH-96Array, Axiom 2.0 Kit). Sample preparation including whole genome amplification, fragmentation, denaturation and hybridization were all performed according to manufacturer’s specifications and recommendations (Affymetrix, Santa Clara, California, USA). Automated, high-throughput processing of genome-wide SNP genotyping was carried out using the GeneTitan system (Affymetrix).

The relatedness was assessed using kinship coefficients estimated by KING [31]. We ran KING to extract a list of individuals that contains no pairs of individuals with a first-, second-, or third-degree relationship. PLINK [32,33] was used to prune the 532,615 autosomal SNPs down to 455,266 SNPs with a minor allele frequency greater than 1%, a missing rate less than 10% and a Hardy-Weinberg equilibrium (HWE) deviation p-value of no less than 0.01. We identified individuals who have an extreme low Z score (less than 4 standard deviation units) as outliers by PLINK outlier detection diagnostics and excluded them from subsequent analysis. Only the remaining 957 unrelated individuals were used in the subsequent analysis (S3 Table), including PCA, Wright’s fixation index (FST) measurement, admixture analysis, TreeMix analysis, inbreeding coefficient, and estimating date and degree of admixture using ALDER and f4-ratio estimation.

The 1000 Genomes Project [15], Human Genome Diversity Project (HGDP) [34], The Simons Genome Diversity Project (SGDP) [35], and Qatari Genome [19] data were used as a reference to assess how the Saudi population samples related to other human populations. Same as Saudi data, we used KING program to exclude duplicated individuals form integrated reference data. Because the Saudi samples and samples in reference databases were analyzed on different platforms, analysis was limited to the intersection of SNPs between these platforms. The intersection contained 426,056 SNPs, which were sufficient to produce reliable results and were used for subsequent analysis.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A