2.3. Genotyping

YY Yunzhou Yang
YZ Yanjun Zan
CH Christa F. Honaker
PS Paul B. Siegel
ÖC Örjan Carlborg
request Request a Protocol
ask Ask a question
Favorite

For chickens from generations S41, S50 and S53 of HWS/LWS and generation R9 HWR/LWR, 60K chicken SNP-chip genotypes were available from an earlier study [8,14]. Also, individual whole-genome re-sequencing (WGS) data on HWS/LWS chickens from generation S41 (~30X) were available from an earlier study [17]. Here, BEAGLE4 was used to impute genotypes from SNP-chip to WGS SNP density in the chickens from HWS/LWS generations 50 and 53, and the HWR/LWR generation 9 [18]. Before imputation, the WGS data were filtered using the following criteria: (1) minor allele frequency (MAF) > 0.05; (2) only bi-allelic SNPs; (3) genotyping quality (GQ) = 20 & mapping quality (MQ) = 50; (4) no genotype missing rate. This resulted in 5,524,212 remaining SNPs on the 28 largest autosomes. From the SNP-chip genotypes, only the 29,147 SNPs that were genotyped also in the WGS datasets were kept, whereas the remaining 24,166 markers were filtered out (Figure S1). After imputation, a custom Python script (https://github.com/tuzixuexi/Purging-Analysis) was used to filter imputed genotypes to only keep those markers with an estimated genotyping probability (GP) above 0.9 in at least 90% of the samples. The dataset after this filtering contained 3,051,963 high quality imputed SNP-genotypes. Before screening for selection signatures in the high and low body weight lineages, a second filtering was performed to only retain SNPs with minor allele frequencies across all generations greater than 0.05, leaving 2,182,770 and 1,348,038 SNPs in the LW and HW lineages, respectively.

For 2982 individuals from AIL generations F1 to F15, whole-genome re-sequencing libraries were prepared using a low-cost Tn5-based protocol. Sequencing was then performed to ~0.4X coverage (mean = 0.48; range 0.003–3.2; n = 16 individuals with < 400 reads removed) on either an Illumina HiSeq4000 or HiSeq X10 [17,19]. Genotypes were imputed for the individual samples using STITCH [20] against the Gallus_gallus-5.0 reference genome (GCA_000002315.3) [21]. To select the K-value in imputations, 10 individuals from generation F0 were down-sampled from ~30X to ~0.4X coverage using seqtk (https://github.com/lh3/seqtk). They were used as control samples to calculate imputation concordance between genotypes imputed from LCS by STITCH with different K-values and genotypes called from the deep-coverage sequence data on the same individuals. Analyses with different values of K were performed for GGA28 (RefSeq: NC_006115.4), and based on them, a K-value of 10 and number of generations to 19 were selected, because they provided an acceptable balance between time consumed in the analyses and the resulting imputation quality (Figure S2).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A