Breakpoint analyses

Eladio J. Márquez; Cheng-han Chung; Radu Marches; Robert J. Rossi; Djamel Nehar-Belaid; Alper Eroglu; David J. Mellert; George A. Kuchel; Jacques Banchereau; Duygu Ucar

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Breakpoint analyses

EM Eladio J. Márquez

CC Cheng-han Chung

RM Radu Marches

RR Robert J. Rossi

DN Djamel Nehar-Belaid

AE Alper Eroglu

DM David J. Mellert

GK George A. Kuchel

JB Jacques Banchereau

DU Duygu Ucar

This method is extracted from research article: Nat Commun, Feb 2020

Sexual-dimorphism in human immune system aging

DOI: 10.1038/s41467-020-14396-9

Request a Protocol

Ask a question

Favorite

We investigated systemic chronological signatures in temporal peaks by testing, in each cluster, for the existence of “breakpoints,” i.e., short age intervals characterized by significant differences in accessibility in the intervals preceding and following the age interval. For each age t in the sampled age interval from t_min to t_max, we tested for mean difference in accessibility between subjects with ages in the intervals t_min-w vs. t_min + w, where w represents a variable window span parameter, and plotted the observed p-values (i.e., −log10 P, or loginvp) as a function of age (t) to identify maxima that suggested the presence of discrete breakpoints. These tests were carried out on normalized and model-adjusted accessibility data corresponding to the ATAC-seq peaks associated to each sex-specific and common cluster identified as trending by ARIMA as described above. Since there were many more peaks than subjects for any given comparison, we used PCA to reduce the dimensionality of each cluster to n = 3 PCs, and used MANOVA on these three dimensions to compute p-values at each tested value of t. For any given value of w, offset values for t_min and t_max were adjusted to match the age of available samples in the study. For example, a window span of 5 years required a t_min = 27 if the youngest available subject was 22 years old, and a t_max = 83 if the oldest available subject was 90 years old. For a given value of w, results from tests contrasting younger vs. older intervals would vary depending on sample size volume and imbalance, with statistical power increasing with the size of the window span. To take advantage of this effect, we deployed a multi-scale algorithm where we carried out tests using w values ranging from 10 to 20 years in order to identify breakpoints that were maximally supported under multiple window spans. Due to sample sparsity and variation, however, tests carried out under varying values of w may be unevenly affected by edge effects and influencing outlier points, which may result in strong significance of a comparison because of the presence of outliers and the partial overlap of a sampled interval with a breakpoint, limiting the ability of the method to precisely discover where such breakpoints may lie. To limit these effects and increase the robustness of the tests, we smoothed the loginvp distributions by fitting LOESS regressions to each comparison (i.e., each set of tests with the same w value) under a range of smoothing bandwidth parameters (i.e., bw = 0.25, 0.30, …, 0.70, 0.75). We combined the resulting 11 p-values at every sampled age using the Fisher’s method, reapplied LOESS smoothing to the resulting distribution, and used numerical differentiation to determine whether each age was predicted to be minimum or a maximum. Finally, we marked every maximum as a significant breakpoint candidate if it satisfied both a parametric criterion, i.e. significance of the Fisher method-combined p-values (χ² test), and a heuristic criterion, namely whether the distance between this local maximum and the nearest minimum equaled or exceeded 25% of the value of the global maximum. The procedure described above results in a smoothed loginvp distribution for each w value, each comprising a series of points including maxima and minima, such that slightly different maxima can be estimated for different w values. Finally, we used Gaussian mixture modeling on the distribution of these maxima, as implemented in R-Mclust package, to group loginvp maxima obtained from different window spans into cohesive breakpoint intervals, whose medians and ranges we report herein for each cluster. Since breakpoints are independently calculated for each cluster, observed overlaps are likely the result of aging-related events with a genome-wide impact.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol