2.4. Calculating the Mahalanobis distance

D.C. Dean, III; N. Lange; B.G. Travers; M.B. Prigge; N. Matsunami; K.A. Kellett; A. Freeman; K.L. Kane; N. Adluru; D.P.M. Tromp; D.J. Destiche; D. Samsin; B.A. Zielinski; P.T. Fletcher; J.S. Anderson; A.L. Froehlich; M.F. Leppert; E.D. Bigler; J.E. Lainhart; A.L. Alexander

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.4. Calculating the Mahalanobis distance

DI D.C. Dean, III

NL N. Lange

BT B.G. Travers

MP M.B. Prigge

NM N. Matsunami

KK K.A. Kellett

AF A. Freeman

KK K.L. Kane

NA N. Adluru

DT D.P.M. Tromp

DD D.J. Destiche

DS D. Samsin

BZ B.A. Zielinski

PF P.T. Fletcher

JA J.S. Anderson

AF A.L. Froehlich

ML M.F. Leppert

EB E.D. Bigler

JL J.E. Lainhart

AA A.L. Alexander

This method is extracted from research article: Neuroimage Clin, Jan 2017

Multivariate characterization of white matter heterogeneity in autism spectrum disorder

DOI: 10.1016/j.nicl.2017.01.002

Request a Protocol

Ask a question

Favorite

The Mahalanobis distance (D_M; (Mahalanobis, 1936)) is a multivariate extension of the Euclidean distance, measuring the distance of each member of a set of multivariate measures to the mean of their multivariate distribution. For each subject, D_M is calculated using the following formula:

where $\vec{x}$ corresponds to the set of multivariate neuroimaging observations for each individual, μ is the mean of the multivariate distribution of neuroimaging measures, and S is the variance-covariance matrix between measures. For example, if m measures are collected from each individual, $\vec{x}$ and μ correspond to a 1 × m vector, S corresponds to an m × m matrix, and D_M from Eq. (1) is a scalar. In this way, D_M accounts for the variance of individual observations as well as the covariance between the set of observations, homologous to the Euclidean distance in univariate analysis. In the present study, we estimate D_M for individuals with ASD from their corresponding DTI brain measures, using the TDC as the population reference. In this case, D_M corresponds to how close an ASD individual's brain measures are to the multivariate mean of the TDC population, where larger D_M represents increased distance from the center of the typically developing population.

In constructing D_M from longitudinal measurements, it is important to account for neurodevelopmental processes (Dean et al., 2014b, Dean et al., 2014a, Lebel and Beaulieu, 2011, Lebel et al., 2012, Snook et al., 2005). Generalized additive mixed models (GAMM's) were fit to the regional developmental trajectories of the DTI parameters (FA, MD, AD, and RD) of the TDC group to characterize the observed age-related white matter changes and establish a normative growth trajectory for each brain region. Generalized additive mixed models were utilized to characterize the age-related changes as these models have been designed specifically for cohort-sequential longitudinal designs (Wood, 2006, Wood, 2012) and for their ability to account for repeated measurements from the same individual. Furthermore, since the growth model that describes white matter is unknown, the semi-parametric nature of these spline models provides flexibility in capturing subtle developmental changes compared to parametric growth models (Travers et al., 2015b). Longitudinal modeling analyses were performed using R version 3.2.1 (R Development Team, 2014), while accounting for the nuisance variables of head coil (due to upgrade discussed earlier) and total motion index.

Upon determining the best fit model of the TDC group, these models were used to predict FA, MD, AD and RD along the modeled TDC growth trajectory for every ASD participant at each time point and for each brain region. The difference between the participants' parameter values from these predicted values (i.e. the model residuals) were calculated, corresponding to the vertical distance between the participant parameter measurements and the TDC reference growth trajectory. D_M for each time point was calculated from these residuals using eq. (1), where $\vec{x} - μ$ corresponds to the difference between observed measurements ( $\vec{x}$ ) and modeled values (μ), and S is the variance-covariance matrix of the modeled residuals from the TDC group.

D_M was similarly calculated for each TDC individual. However, to avoid including an individual's measurements in the model fitting when establishing the reference growth trajectory, a leave-one-out approach was used when modeling regional FA, MD, AD and RD developmental trajectories. For a given TDC subject, FA, MD, AD, and RD longitudinal measurements were removed prior to modeling and participant-specific residuals and D_M was calculated as before. This process was repeated for each TDC participant.

After calculating D_M for each longitudinal time point, an average across these time points was computed for each participant, providing a single, representative D_M value for each individual. Supplementary Fig. 1 displays a representative schematic illustrating the process of calculating D_M. Distributions of Mahalanobis distances were generated for both the ASD and TDC groups and the Bhattacharyya coefficient (Mo et al., 2015) was computed to assess the degree of overlap between the group distributions, where smaller Bhattacharyya coefficients correspond to a lesser degree of overlap. D_M was additionally calculated by considering regional DTI parameters (FA, MD, AD, and RD) separately.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol