We tested for the occurrence of positive selection of LoF mutations, by calculating two neutrality statistics: the interpopulation FST, which identifies loci displaying high levels of variation in allele frequencies between groups of populations (51), and the intrapopulation iHS (52), which compares the extent of haplotype homozygosity at the ancestral and derived alleles. Positive selection analyses were confined to biallelic SNPs found in the 1000 Genomes Project phase 3 data (90), including 2,504 individuals from 26 populations, assigned to five metapopulations and predicted to have severely damaging consequences (SI Appendix, Table S1). Multiallelic SNPs, SNPs not detected in the 1000 Genomes Project, and indel frameshifts were discarded in the positive selection analysis. For FST calculation, we investigated a total of 75 LoF mutations that passed quality filters, and compared the allele frequencies of these variants in 26 populations to the allele frequencies of the same mutations in the European CEU and African Yoruba in Ibadan (YRI) reference populations (SI Appendix, Table S1). More specifically, we compared allele frequencies in populations from AFR, EAS, and SAS metapopulations to allele frequencies in the CEU groups, and allele frequencies in populations from the AMR and EUR metapopulations to allele frequencies in the YRI group. For the detection of candidate variants for positive selection based on FST values, we used an outlier approach and considered LoF mutations presenting FST values located in the top 5% of the distribution of FST genome wide. We identified 32 LoF mutations presenting high FST values in at least one population, in 32 genes (including 8 mutations located in OR genes). For haplotype-based iHS score calculations, we first defined the derived allele state of each SNP based on the 6-EPO alignment, and retained only SNPs with a derived allele frequency between 10% and 90% to maximize the power of iHS to detect selective signals. These additional filters led to a total of 68 LoF mutations to be investigated. We calculated iHS scores in 100-kb windows with custom-generated scripts and normalized values. For the detection of selection events targeting derived alleles, we considered LoF mutations located in the top 5% most negative iHS values genome wide and found a total of 15 LoF mutations with iHS scores in the lowest 5% of the iHS values genome wide in at least one population (including 1 mutation located in an OR gene).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.