DNA was obtained from 16 animals representing all six species of Papio baboons and the gelada, T. gelada (table S3). Of the 16 individuals, 9 were wild animals sampled in the field, and the remaining samples were obtained from captive colonies. The species identity of each sampled animal was determined from its external phenotype. The integrity of subsequent sequence data files was confirmed by comparing the mtDNA sequences obtained through whole-genome analysis to other mtDNA sequences from baboons of known species and geographic location (12). All such species assignments were confirmed and validated. All these diversity samples were sequenced to an average read depth of 30.7× using the Illumina HiSeq 2000 sequencing platform (100-bp paired-end reads), with the one exception that the T. gelada sample was sequenced on the Illumina HiSeq X platform.

We used BWA-MEM version 0.7.12-r1039 (https://arxiv.org/abs/1303.3997) to align the Illumina reads to the baboon reference assembly Panu3.0/papAnu3 and generate BAM (Binary Alignment Map) files (fig S2). Picard MarkDuplicates version 1.105 (http://broadinstitute.github.io/picard/) was used to identify and mark duplicate reads. Variants were called using GATK version 3.3-0 following best practices for that version (https://software.broadinstitute.org/gatk/best-practices/). In brief, indels were realigned using IndelRealigner. HaplotypeCaller was used to generate gVCFs for each sample. Joint genotype calling was performed on all samples using GenotypeGVCFs to generate a VCF file. GATK hard filters (SNPs: “QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”; Indels: “QD < 2.0 || FS > 200.0 || ReadPosRankSum < −20.0”) (https://software.broadinstitute.org/gatk/documentation/article?id=2806) were applied, and all variant calls that failed the filters were removed.

To perform functional annotations through WGSA (Whole Genome Sequence Annotator) (48), the SNVs identified in the baboon diversity panel were transferred to the human genome (hg19) using liftOver and treated as human SNVs. All annotation resources available for version 0.5 were used for this analysis, including five functional prediction scores, eight conservation scores, allele frequencies from four large-scale resequencing studies, and variants in four disease-related databases, among others.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.