4.5. Differential gene expression analysis model

JD Jayanta Kumar Das
NB Nirad Banskota
JC Julián Candia
MG Michael E. Griswold
MO Melissa Orenduff
RC Rafael de Cabo
DC David L. Corcoran
SD Sai Krupa Das
SD Supriyo De
KH Kim Marie Huffman
VK Virginia B. Kraus
WK William E. Kraus
CM Corby K. Martin
SR Susan B. Racette
LR Leanne M. Redman
BS Birgit Schilling
DB Daniel W. Belsky
LF Luigi Ferrucci
request Request a Protocol
ask Ask a question
Favorite

Since this study focused on protein‐coding transcripts, all non‐coding RNAs were excluded. Also excluded were RNAs with read count = 0, in 50% or more samples in any of the study subgroups (baseline, 12‐, 24‐month) of CR or AL (n = 3450). Then, the raw read count of the remaining 16,450 RNAs was normalized and converted to log2 transformed counts per million (CPM), that is, log2(CPM), using the edgeR package implemented in R (Robinson et al., 2010). Next, for each RNA with log2(CPM), linear mixed effect models (LMM) were built using the lme4 package implemented in R (Bates et al., 2015) to estimate associations of randomization group (CR vs. AL) with baseline RNA expression levels and RNA expression trajectories over time, adjusting for baseline age, baseline bmi, sex, race, and sequence batch in the mean model, and utilizing random intercepts in the variance model. A benefit of using LMMs in this study is their operation under an inherent Missing At Random (MAR) assumption (Griswold et al., 2021). MAR was supported from both similarities in characteristics between those dropping out versus those completing the study, and also from the reasons for missingness, which were primarily due to biopsy pain avoidance and not due to muscle outcomes of interest reported here. RNA expression trajectories (linear changes over time) for each randomization group, and differences between these groups in their trajectories (estimated by the interaction term between randomization group and linear time‐years) were estimated in our primary LMMs as shown in Equation 1. We then extracted p‐values (default t‐tests use Satterthwaite's method [“lmerModLmerTest”]) and β‐values (represent trajectory slope difference in CR compared to AL) from LMMs that we considered likely to identify the most significant protein‐coding transcripts that changed differentially between CR and AL over time. Ranking of protein‐coding transcripts using extracted p‐values with signs of β‐values from LMM models (linear time) was used in enrichment analyses (discussed below).

For expression results, we also incorporated additional information using similar LMM models but replaced the linear time terms with visit indicator functions (non‐linear time) to estimate differentially expressed transcripts at the specific 12‐ and 24‐month visits between the CR and AL groups (Equation 1). Similarly, we also extracted p‐values and β‐values from LMMs for 12‐ and 24‐month expression between the CR and AL groups. Both approaches can be described via the equation:

where β1 is the main effect (slope) that represents the trajectory difference between CR and AL groups, time is either continuous (years from baseline) or discrete (baseline, 12‐, and 24‐month visits) and ui is a random intercept to account for participant differences. Quantitatively, significant protein‐coding transcript results were compared for both approaches.

We used a threshold of p < 0.05 (unadjusted) for differential gene expression analysis. The top most significant genes that relate to influencing biological mechanisms through calorie restriction were considered for further characterization and exploration through literature review, and also using GeneCards (Stelzer et al., 2016). The package ggplot2 implemented in R (Wickham et al., 2016) along with various other R packages were used for generating various plots and linear smooth trajectories for selected transcripts.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A