Correlation between Gene Expression and Flowering Time in a Natural Population

SG Shirin Glander
FH Fei He
GS Gregor Schmitz
AW Anika Witten
AT Arndt Telschow
JM Juliette de Meaux
request Request a Protocol
ask Ask a question
Favorite

We analyzed two published sets of natural ecotypes for which both genome-wide expression profiles and flowering time estimates were available. The first data set comprised 138 lines from Sweden scored for both flowering time (for plants grown at 16-h light–8-h dark at constant 16°C) and gene expression in whole rosette collected at the 9-true-leaf stage (Dubin et al. 2015; Sasaki et al. 2015). For this first data set, gene expression and flowering were determined in the same experiment. The second data set combined data from two sources. RNA extracted from 7-day-old seedlings of 144 genotypes grown on agar plate in long days had been sequenced (Schmitz et al. 2013) and expression levels quantified as quantile normalized fragment numbers per kilobases and million reads (FPKM). For 52 of these genotypes, flowering time, measured in cumulative photothermal units, had been scored in the field (Brachi et al. 2010). Photo-thermal units sum up the combination of temperature and day length and thus provide an estimate of the duration of the favorable season.

Expression counts were loge +1-transformed to include null values of expression and a Spearman correlation coefficient between flowering time and expression level was computed for each gene. P values were adjusted for false discovery rate using the p.adjust function in R (Benjamini and Hochberg 1995; Yekutieli and Benjamini 1999). A Kolmogorov–Smirnov test was used to compare the distribution of Spearman correlation coefficients ρ of flowering time and immunity genes with the distribution of ρ for 22,686 genes for which gene expression was quantified. Gene enrichments were tested using hypergeometric tests in R. The GO enrichment analysis was performed with the Gene Set Enrichment Analysis (GSEA) test akin to nonparametric Kolmogorov–Smirnov tests, first described by Subramanian et al. (2005), and implemented in the “topGO” R package (Alexa and Rahnenfuhrer 2010). We further applied the elim procedure, available in this package, which calculates enrichment significance of parent nodes after eliminating genes of significant children nodes. This controls for the dependency among nested parent–child GO categories so that the significance of each enrichment can be interpreted without overconservative P value corrections for multiple-testing (Alexa et al. 2006). To test the impact of population structure on the correlation, we ran a mixed model with the help of the R package lmekin. For each gene, we used gene expression level as a dependent variable. Flowering time was used as independent variable and a kinship matrix, generated with a matrix of SNPs segregating among Swedish genotypes (Duin et al. 2015), was included as random effect. The estimate of the flowering time effect was extracted. This allowed compared the distribution of estimates observed for the whole genome, the subset of flowering time genes, or the subsets of defense genes.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A