Metabolomic profiling: The raw metabolomic data underwent sum normalization, autoscaling, and log transformation. Principal Component Analysis (PCA) was employed to detect outliers; any sample deviating by more than 3 standard deviations away from the center of the first three principal components was considered an outlier (i.e., 99.7% confidence). No outliers were identified. To estimate sample diagnosis, a logistic regression model was applied, using the first five principal components, as well as sample age, sex, and post-mortem interval as inputs. To identify differentially expressed metabolites, a robust linear regression model was fitted to each metabolite, with sample diagnosis, age, sex, and PMI as covariates. p-values for diagnosis were calculated using empirical Bayes treatment of fitted models. Metabolites with FDR q < 0.05 were deemed significantly differentially expressed in the disease. To estimate metabolite pathway enrichment, the metabolites were ranked based on their log-transformed p value multiplied by the sign of fold change. This ranking method ensured that metabolites with a significant increase in abundance were at the top of the list, while those with the most significant decrease in abundance were at the bottom. Any metabolites without an HMDB ID were removed from the ranked list. Metabolomic pathways were downloaded from the KEGG database using the R package multiGSEA, and then metabolite set enrichment was calculated with 10,000 permutations using the fgsea package (version 1.16.0).
DNA methylation profiling: The raw *idat files were read using minfi package of bioconductor, which was also considered to mark the failed methylation probes. To identify outliers in the data, several quality control measures were performed. First, samples with more than 20% of failed probes were considered outliers based on the proportion of failed probes. The median probe intensity of U and M probes was then calculated, and samples in the lower-left corner of the plot were marked as outliers. Gender prediction was performed by comparing the median signal in X and Y chromosomes, and samples with a mismatch between the predicted and known gender were marked as outliers. Principal component analysis of the centered, unnormalized β value matrix was used to identify samples deviating by more than 2 standard deviations from the mean of the first three principal components and marked as outliers. Noob normalization was used to normalize the samples, and methylation β values were extracted for the remaining probes. To adjust for the effect of sample position on the EPIC array, the empiricalBayesLM function from the WGCNA package was used, and the effect was modeled as a second-degree polynomial. The proportion of neuronal cells in each sample was estimated using the flow-sorted PFC samples and the estimateCellCount function from the minfi package. The sample DNA methylation age was estimated using the ENmix package. Age acceleration was defined as the residuals of a linear model, where DNAmAge was the dependent variable, and sample chronological age, sex, postmortem interval, and proportion of neuronal cells were the independent variables. To identify differentially methylated CpGs, robust linear regression was employed using the R package limma with sample diagnosis, age, sex, postmortem interval, and proportion of neuronal nuclei as covariates. p-value estimates for diagnosis were obtained after the empirical Bayes treatment of the fitted models. Cytosines with a false discovery rate (FDR) q-value of less than 0.05 were considered significantly altered in HD individuals.
To conduct an epigenome enrichment analysis, cytosines were mapped to gene names using the UCSC_RefGene_Name column specified in the Bioconductor EPIC array annotation package IlluminaHumanMethylationEPICanno.ilm10b4.hg19. When a CpG locus was annotated with multiple genes multiple times, the one that was most frequently associated with the locus was chosen. The genes were ranked based on the significance of the affected cytosines multiplied by the sign of fold change. For genes that mapped to multiple cytosines, the one with the smallest p-value was selected. To conduct the pathway enrichment analysis, KEGG pathways were downloaded and the fgsea function was used.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.