Recombination rate and functional annotation

GT Gregg W. C. Thomas
JH Jonathan J. Hughes
TK Tomohiro Kumon
JB Jacob S. Berv
CN C. Erik Nordgren
ML Michael Lampson
ML Mia Levine
JS Jeremy B. Searle
JG Jeffrey M. Good
ask Ask a question
Favorite

We retrieved 10,205 genetic markers generated from a large heterogenous stock of outbred mice (Shifman et al. 2006; Cox et al. 2009) to assess whether phylogenetic discordance along chromosomes is correlated with mouse recombination rates. We converted the physical coordinates of these markers from build 37 (mm9) to build 38 (mm10) of the M. musculus genome using liftOver (Hinrichs et al. 2006). We then partitioned the markers into 5Mb windows and estimated local recombination rates in each window. Estimated recombination rates were defined as the slope of the correlation between the location on the genetic map and the location on the physical map of the M. musculus genome for all markers in the window (White et al. 2009; Kartje et al. 2020). Within each 5Mb window, we calculated wRF distances between the first 10kb window and every other 10kb window.

We also compared the chromosome-wide wRF distances to those based on phylogenies from regions around several types of adjacent to genomic features. We retrieved coordinates from 25,753 protein coding genes annotated in M. musculus from Ensembl (release 99; (Cunningham et al. 2022)), all 3,129 UCEs from the M. musculus UCE probe set provided with PHYLUCE (Faircloth et al. 2012; Faircloth 2016), and 9,865 recombination hotspots from Smagulova et al. (2011). The recombination hotspot coordinates were converted between build 37 and build 38 using the liftOver tool (Hinrichs et al. 2006). For each feature, the starting window was the 10kb window containing the feature’s midpoint coordinate. We then calculated wRF between this window and all windows within 5Mb in either direction and for each chromosome compared the slope and wRF distance of windows adjacent to the feature with the same metrics for the whole chromosome. We compared distributions of these measures for each genomic feature with an ANOVA (aov(feature.measure ~ feature.label)) followed by Tukey’s range test (TukeyHSD(anova.result)) to assess differences in means, as implemented in R v4.1.1 (R Core Team 2021).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A