Conversion of empirical data to genotype matrices
This protocol is extracted from research article:
Detecting adaptive introgression in human evolution using convolutional neural networks
eLife, May 25, 2021; DOI: 10.7554/eLife.64669

Using bcftools (Li, 2011), we performed a locus-wise intersection of the following VCFs: 1000 Genomes (The 1000 Auton et al., 2015), IGDP (Jacobs et al., 2019), the high coverage Denisovan genome (Meyer et al., 2012), and the Altai and Vindija Neanderthal genomes (Prüfer et al., 2014). All VCFs corresponded to the GRCh37/hg19 reference sequence. Genotype matrices were constructed by parsing the output of bcftools query over 100 kbp windows, filtering out sites with sample allele frequency <5% or with more than 10% of genotypes missing, then excluding windows with fewer than 20 segregating sites. Each genotype matrix was then resized and sorted as described for simulations. When data were considered to be phased, as for the CEU/YRI populations, we also treated the Neanderthal genotypes as if they were phased according to REF/ALT columns in the VCF. While this is equivalent to random phasing, both high-coverage Neanderthal individuals are highly inbred, so this is unlikely to be problematic in practice.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.