Methods for model selection

Andrew J. Sedgewick; Ivy Shi; Rory M. Donovan; Panayiotis V. Benos

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Methods for model selection

AS Andrew J. Sedgewick

IS Ivy Shi

RD Rory M. Donovan

PB Panayiotis V. Benos

This method is extracted from research article: BMC Bioinformatics, Jun 2016

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

DOI: 10.1186/s12859-016-1039-0

Request a Protocol

Ask a question

Favorite

K-fold cross-validation (CV) [14] splits the data into K subsets and holds each set out once for validation while training on the rest. We use K = 5 and average the negative log-pseudolikelihood of the test sets given the trained models. The Akaike information criterion (AIC) [15] and Bayes information criterion (BIC) [16] are model selection methods that optimize the likelihood of a model based on a penalty on the size of the model represented by degrees of freedom. To calculate the AIC and BIC, we substitute the pseudolikelihood for the likelihood and we define the degrees of freedom of the learned network as follows.

In the standard lasso problem, the degrees of freedom is simply the number of non-zero regression coefficients [17]. So, in the continuous case, the degrees of freedom of a graphical lasso model is the number of edges in the learned network. In the mixed case, edges incident to discrete variables have additional coefficients corresponding to each level of the variable. Lee and Hastie’s MGM uses group penalties on the edge vectors, ρ, and matrices, ϕ, to ensure that all dimensions sum to zero. So, in the model, an edge between two continuous variables adds one degree of freedom, and edge between a continuous variable and a categorical variable with L levels adds L-1 degrees of freedom, and an edge between two discrete variables with L_i and L_j levels adds (L_i – 1)(L_j - 1) degrees of freedom.

We compare these model selection methods to an oracle selection method. For the oracle model, we select the sparsity parameters that minimize the number of false positives and false negatives between the estimated graph and the true graph. While we do not know the true graph in practice and none of the other methods use the true graph, this method shows us the best possible model selection performance under our experimental conditions.

AIC, BIC, and CV all require calculating the pseudolikelihood from a learned model so to optimize over separate sparsity penalties for each edge type, we perform a cubic grid search of λ_cc, λ_cd, and λ_dd over {.64, .32, .16, .08, .04}.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol