Imputation

PH Paul J. Hop
RL René Luijk
LD Lucia Daxinger
MI Maarten van Iterson
KD Koen F. Dekkers
RJ Rick Jansen
JM Joyce B. J. van Meurs
PH Peter A. C. ’t Hoen
MI M. Arfan Ikram
MG Marleen M. J. van Greevenbroek
DB Dorret I. Boomsma
PS P. Eline Slagboom
JV Jan H. Veldink
EZ Erik W. van Zwet
BH Bastiaan T. Heijmans
request Request a Protocol
ask Ask a question
Favorite

Since DNA methylation and RNAseq data are informative for age, sex, and white blood cell composition [8790], we used the data to impute these variables. Missing observations were imputed separately for the RNAseq and DNA methylation data because there is incomplete overlap between the datasets. Missing observations in the measured white blood cell counts (WBCC) were imputed using the R package pls, adjusting for reported age and sex, as described earlier (https://molepi.github.io/DNAmArray_workflow/05_Predict.html) [20]. For missing age and sex measurements, we compared the performance of the elastic net, LASSO, ridge, and pls methods. To evaluate the performance of these models, the data was randomly split into a train set (2/3) and a test set (1/3). This procedure was repeated 25 times, each time calculating the accuracy in the test set (mean squared error for age and F1-score for sex). The above algorithm was performed using varying numbers of input variables (50 to 10,000), where the input variables were selected based on their correlation with the outcome. The model and number of input variables that resulted in the best average accuracy in the test sets were selected to impute missing data. The average correlation between predicted and reported age in the tests sets was 0.98 for the DNA methylation data and 0.92 for the RNAseq data. Sex was almost perfectly predicted (accuracy ≈ 0.995) in both DNA methylation and RNAseq data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A