2.7 Tree-based concepts of bivariate variables interaction

JD Jean-Eudes Dazard
HI Hemant Ishwaran
RM Rajeev Mehlotra
AW Aaron Weinberg
PZ Peter Zimmerman
ask Ask a question
Favorite

From Ishwaran’s initial work on pairwise interaction statistic (Ishwaran, 2007), a paired interaction statistic between variables xj and xk, denoted VIMP(xj, xk), can be defined for j, k ∈ {1, …, p}, j < k, as follows: VIMP(xj, xk) = | PIMP(xj, xk) − AIMP(xj, xk)|, where PIMP(xj, xk) is the paired joint variable importance between variables xj and xk, defined as the amount that prediction error increases (or decreases) when xj and xk are simultaneously perturbed. The term AIMP(xj, xk) is the additive variable importance defined as the sum of each individual variable importance: AIMP(xj, xk) = VIMP(xj)+VIMP(xk). If the univariate variable importance for each variable is significantly large, a large VIMP(xj, xk) indicates a possible pairwise interaction (see Ishwaran, 2007, for more details).

However, we do not necessarily want to assume that the univariate variable importance for each variable is significantly large (i.e. both variables are marginally informative) or honor the hierarchy restriction in Bien et al.,’s sense (Bien, Taylor & Tibshirani, 2013) that an interaction may only be included in a pairwise interaction model if one or both variables are marginally important. So, we built upon the concepts of maximal subtree (Ishwaran, 2007) and minimal depth of a maximal subtree (Ishwaran et al., 2010) introduced by Ishwaran to define here an alternative bivariate interaction statistic between any two variables xj and xk, which we termed Interaction Minimal Depth Maximal Subtree (IMDMS) and denoted Ψ(j, k), for j, k ∈ {1, …, p}, j < k, as follows. Based on the original minimal depth concept, we first use the normalized minimal depth of a variable xj with respect to the maximal subtree for variable xk (normalized w.r.t. the size of xk’s maximal subtree), denoted by MDMS(xj, xk), that is, the shortest distance that xj splits under xk, where the distance is normalized with respect to the height of the tree with the xk split denoting the root node (see Ishwaran et al., 2010, for details). A small value indicates that xj is related to xk. Because MDMS(xj, xk) is not symmetric in its arguments, we then use the reciprocal MDMS(xj, xk) to define Ψ(j, k) for j, k ∈ {1, …, p}, j < k, as:

A small IMDMS value identifies a possible pairwise interaction.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A