3.2. Entropy and Adjusted Mutual Information

Won-Yung Lee; Sang Hyuk Kim; Siwoo Lee; Young Woo Kim; Ji-Hwan Kim

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

3.2. Entropy and Adjusted Mutual Information

WL Won-Yung Lee

SK Sang Hyuk Kim

SL Siwoo Lee

YK Young Woo Kim

JK Ji-Hwan Kim

This method is extracted from research article: Healthcare (Basel), Nov 2022

Exploratory Analysis of the Sasang Constitution by Combining Network Analysis and Information Entropy

DOI: 10.3390/healthcare10112248

Ask a question

Favorite

Adjusted mutual information (MI) was used to calculate the degree of relevance between features or patients. MI is a measure of the amount of information that one random variable contains about another. The adjusted MI corrects for the bias in which the MI usually increases as the size of the vector increases, regardless of their actual association [21]. For the two feature vectors U and V, the adjusted MI is calculated as follows:

where MI (U, V) is the mutual information between vectors U and V and can be calculated as follows:

where N denotes the total number of dimensions of vectors U and V (i.e., the number of samples), and $| U_{i} |$ and $| V_{j} |$ denote the number of samples in clusters $U_{i}$ and $V_{j}$ , respectively.

Entropy focused on the feature classes was used to measure the amount of information included in the SC type for the combination of feature classes. A random variable in entropy is defined as a feature class, and the probability by feature class is defined as the ratio of the sum of the adjusted MI between a target feature (i.e., SC type) and other features to the sum of the adjusted MI between target features and features belonging to the feature class. To assess how uncommon the derived value is, we calculated the entropy values for all features in the same manner and compared the relative ranks of the entropy values for the target features. This ranking was used to identify the combination of feature class information mainly included in the SC type. In other words, the combination of feature classes yielding a high relative ranking is the information mainly included in the feature of interest. AMI and entropy focused on the feature classes were calculated using scikit-learn 0.22.1.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol