Detection of outliers using LFMM

JM Joanna Meger
BU Bartosz Ulaszewski
JB Jaroslaw Burczyk
request Request a Protocol
ask Ask a question
Favorite

Because phenotypic variables were represented by population averages instead of individual measures, we applied the same approaches (LFMM and RDA) to detect loci with significant effects related to geographic, climate, and phenotypic variables. We used the latent factor mixed model (LFMM) approach [173] to find candidate loci under selection. According to P de Villemereuil, É Frichot, É Bazin, O François and OE Gaggiotti [76], LFMM is expected to provide the best compromise between power and error rate across different analytical scenarios. LFMM is also known to be less susceptible to both false negatives and false positives [173, 174] than other genotype-environment association (GEA) methods, such as Bayenv2 [175], because it does not rely on a specific demographic model when accounting for population structure [76, 174].

We employed an MCMC algorithm for regression analysis whereby potentially confounding population structure is modeled with unobserved (latent) factors [176]. As missing data can reduce the power of association studies [177, 178], we imputed the missing data based on the ancestry coefficients estimated by sNMF, using the “impute” function from the R package LEA [176]. In sNMF, we set K based on the number of distinct genetic clusters identified following the population genetic structure analysis and kept the best out of 10 runs based on a cross-entropy criterion. The MCMC algorithm was used for each of the geographic, climate, and phenotypic variables (i.e. longitude, latitude, altitude, PC1-PC3, spring and autumn phenology and height), using 50,000 steps for burn-in and 100,000 additional steps to compute LFMM parameters (z-scores) for all loci. The number of latent factors was set at the identified value of K. In order to compensate for run-to-run variation, the analysis was repeated over 10 independent runs and z-scores across runs were then combined in R using the LEA package [176]. The LEA package was also used to adjust p-values for multiple testing using the Benjamini–Hochberg procedure, and to calculate the genomic inflation factor to modify z-scores allowing for the control of the FDR, as described in E Frichot and O François [176]. A list of candidate loci with an FDR of 1% and adjusted p-values of < 0.001 was then generated for each explanatory variable.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A