DNA-methylation preprocessing and exclusions

LR L. Raffington
TS T. Schwaba
MA M. Aikins
DR D. Richter
GW G. G. Wagner
KH K. P. Harden
DB D. W. Belsky
ET E. M. Tucker-Drob
ask Ask a question
Favorite

DNA was extracted from buccal swabs collected using Isohelix IS SK-1S Dri-Capsules [20]. DNA extraction and methylation profiling was conducted by the Human Genomics Facility (HuGe-F) at the Erasmus Medical Center in Rotterdam, Netherlands. The Infinium MethylEPIC v1 manifest B5 kit (Illumina, Inc., San Diego, CA) was used to assess methylation levels at 865,918 CpG sites.

DNAm preprocessing was primarily conducted with Illumina’s GenomeStudio software and open-source R (version 4.2.0) packages ‘minfi’ [21] and ‘ewastools’ [22]. We generated 20 control metrics in GenomeStudio as described in the BeadArray Controls Reporter Software Guide from Illumina (note similar parameters can be computed using the ewastools ‘control_metrics()’ function). Samples falling below the Illumina-recommended cut-offs were flagged and further investigated. Flagged samples were classified as failed if 1. all types of poor bisulfite conversion and all types of poor bisulfite conversion background; 2. all types of bisulfite conversion background falling below 0.5; 3. all types of poor hybridization; and 4. all types of poor specificity (excluded n = 42).

As a second step, we identified unreliable data points resulting from low fluorescence intensities by filtering using detection p-values, calculated from comparing fluorescence intensities to a noise distribution. We removed probes with only background signal in a high proportion of samples (proportion of samples with detection p-value > 0.01 is > 0.1). We also removed probes for which a high proportion of samples had low bead numbers (proportion of samples with bead number < 3 is > 0.1). Further, we removed probes with SNPs at the CG or single base extension position as well as cross-reactive probes for EPIC arrays [23, 24].

We used minfi’s ‘preprocessNoob’ [25] to correct for background noise and color dye bias and ‘BMIQ’ to account for probe-type differences [26].

Cell composition was estimated using HEpiDISH, which is an iterative hierarchical version of the EpiDISH R package using robust partial correlations (https://github.com/sjczheng/EpiDISH). Because epithelial cell types are the dominant cell type in buccal samples, we applied a threshold of 0.5 for epithelial cell proportions to reliably call a ‘buccal sample’ and excluded samples that failed this metric (n = 28). All samples were from the same batch. Final analytic sample size after DNAm exclusions was n = 1058.

In GSE111165 blood samples, DNAm algorithms were residualized for reference-free cell composition and plate [27].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A