2.7 Tree-based concepts of bivariate variables interaction

Jean-Eudes Dazard; Hemant Ishwaran; Rajeev Mehlotra; Aaron Weinberg; Peter Zimmerman

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.7 Tree-based concepts of bivariate variables interaction

JD Jean-Eudes Dazard

HI Hemant Ishwaran

RM Rajeev Mehlotra

AW Aaron Weinberg

PZ Peter Zimmerman

This method is extracted from research article: Stat Appl Genet Mol Biol, Feb 2018

Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting

DOI: 10.1515/sagmb-2017-0038

Ask a question

Favorite

From Ishwaran’s initial work on pairwise interaction statistic (Ishwaran, 2007), a paired interaction statistic between variables x_j and x_k, denoted VIMP(x_j, x_k), can be defined for j, k ∈ {1, …, p}, j < k, as follows: VIMP(x_j, x_k) = | PIMP(x_j, x_k) − AIMP(x_j, x_k)|, where PIMP(x_j, x_k) is the paired joint variable importance between variables x_j and x_k, defined as the amount that prediction error increases (or decreases) when x_j and x_k are simultaneously perturbed. The term AIMP(x_j, x_k) is the additive variable importance defined as the sum of each individual variable importance: AIMP(x_j, x_k) = VIMP(x_j)+VIMP(x_k). If the univariate variable importance for each variable is significantly large, a large VIMP(x_j, x_k) indicates a possible pairwise interaction (see Ishwaran, 2007, for more details).

However, we do not necessarily want to assume that the univariate variable importance for each variable is significantly large (i.e. both variables are marginally informative) or honor the hierarchy restriction in Bien et al.,’s sense (Bien, Taylor & Tibshirani, 2013) that an interaction may only be included in a pairwise interaction model if one or both variables are marginally important. So, we built upon the concepts of maximal subtree (Ishwaran, 2007) and minimal depth of a maximal subtree (Ishwaran et al., 2010) introduced by Ishwaran to define here an alternative bivariate interaction statistic between any two variables x_j and x_k, which we termed Interaction Minimal Depth Maximal Subtree (IMDMS) and denoted Ψ(j, k), for j, k ∈ {1, …, p}, j < k, as follows. Based on the original minimal depth concept, we first use the normalized minimal depth of a variable x_j with respect to the maximal subtree for variable x_k (normalized w.r.t. the size of x_k’s maximal subtree), denoted by MDMS(x_j, x_k), that is, the shortest distance that x_j splits under x_k, where the distance is normalized with respect to the height of the tree with the x_k split denoting the root node (see Ishwaran et al., 2010, for details). A small value indicates that x_j is related to x_k. Because MDMS(x_j, x_k) is not symmetric in its arguments, we then use the reciprocal MDMS(x_j, x_k) to define Ψ(j, k) for j, k ∈ {1, …, p}, j < k, as:

A small IMDMS value identifies a possible pairwise interaction.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol