Sex-specific SNP analysis

KB Kristen A. Behrens
HZ Holger Zimmermann
RB Radim Blažek
MR Martin Reichard
SK Stephan Koblmüller
TK Thomas D. Kocher
request Request a Protocol
ask Ask a question
Favorite

The main basis of our analyses is the identification and analysis of sex-specific SNPs. These SNPs were identified following the methods described in Behrens et al.52 using the pipeline developed by Gammerdinger et al.43. Previously reported code from that study is available (https://github.com/Gammerdinger/sex-SNP-finder). Briefly, the sequence reads were aligned with BWA version 0.7.12 using the default parameters along with read group labels53. We aligned all samples to the closest high-quality reference assembly, the Malawi zebra (Maylandia zebra—UMD2a, RefSeq GCF_000238955.4)54. In some cases, some of the sex-specific signal mapped to unanchored contigs of the Malawi zebra assembly, so we remapped the reads to the more contiguous assembly of Nile tilapia (Oreochromis niloticus—UMD_NMBU, RefSeq GCF_001858055.2)55. At each variable nucleotide site we calculated the FST statistic between the populations of male and female sequence reads. The resulting FST plots provide a first indication of the differentiation between male and female genomes. We further identified XY- and ZW- patterned SNPs as SNPs that were fixed (frequency less than 0.1) in one sex and polymorphic (frequency between 0.3 and 0.7) in the other sex. Separate plots of the allele frequency of XY- and ZW- patterned SNPs often allowed determination of the type of heterogametic system segregating (XY or ZW).

Next we used Bedtools56 make windows and coverage to calculate the density of sex-patterned SNPs in 100kbp windows across the genome. We identified the top 1% of windows (~ 78 of 7,800 anchored windows) with the highest number of sex-patterned SNPs using the methodology described in Kocher et al.57. The log2(XY:ZW) ratio of SNP density was then calculated for each window58. A Kruskal–Wallis (KW) test on the ranked data was conducted in R (v.2023.03.0 + 386) using kruskal.test from the stats package to determine if the log ratio differed among chromosomes59. If the differences were statistically significant, the Dunn’s test from the rstatix R package was conducted post-hoc to determine which chromosomes differed significantly from one another with Benjamini–Hochberg correction for multiple tests. Regions of elevated sex-specific SNP density were visualized in IGV60 to identify candidate sex determining genes.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A