Maximum information coefficient (MIC)

Qinqing Xiong; Wenju Wang; Mingya Wang; Chunhui Zhang; Xuechun Zhang; Chun Chen; Mingshi Wang

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Maximum information coefficient (MIC)

QX Qinqing Xiong

WW Wenju Wang

MW Mingya Wang

CZ Chunhui Zhang

XZ Xuechun Zhang

CC Chun Chen

MW Mingshi Wang

This method is extracted from research article: iScience, Nov 2022

Prediction of ground-level ozone by SOM-NARX hybrid neural network based on the correlation of predictors

DOI: 10.1016/j.isci.2022.105658

Request a Protocol

Ask a question

Favorite

Predictors are critical to the model’s prediction performance, and too many irrelevant variables or missing key variables will affect the prediction accuracy.²⁴ Predictor screening is mainly divided into trial-and-error²⁶ and analytical methods,³⁸ although the trial-and-error method is simple to operate, the amount of arithmetic is very large and does not reflect the relationship between predictors and accuracy, while the analytical method based on factor correlation is better than the trial-and-error method. However, Pearson and Spearman correlation coefficients are only sensitive to linear relationships and cannot effectively capture the nonlinear relationships between both meteorological factors and precursors and ozone. Although mutual information (MI) has good performance in analyzing nonlinear relationships between variables, the probability density functions of the variables are unknown and the mutual information is difficult to estimate.³⁹^,⁴⁰ In contrast, MIC is applicable to any functional relationship, whether linear or nonlinear, and the outliers of the variables have less impact on the results. Therefore, this study used the maximum information coefficient (MIC) to screen out factors with some correlation with ozone as predictors.

Reshef proposed the maximum information coefficient (MIC) to analyze the nonlinear correlation of big data.²⁷ MIC is calculated by mutual information and grid division. Mutual information is an important indicator for determining the degree of correlation between variables, and it is defined as (Equation 5), (Equation 6), (Equation 7), (Equation 8):

Where A = {a_i, i = 1, 2, ···, n}; B = {b_i, i = 1, 2, ···, n}; n denotes the number of samples; The joint probability density of A and B is p(a, b); The marginal probability densities of A and B are denoted by p(a) and p(b), respectively; MIC is the maximum information coefficient; D/G denotes that data D is divided using G; M(D)_{x, y} is the maximum normalized MI value obtained by dividing a feature matrix into different divisions; B(n) is the upper limit of grid division x×y, which is generally defined as ω(1)≤B(n)≤O(n^1−ε), 0<ε < 1.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol