Adjusted mutual information (MI) was used to calculate the degree of relevance between features or patients. MI is a measure of the amount of information that one random variable contains about another. The adjusted MI corrects for the bias in which the MI usually increases as the size of the vector increases, regardless of their actual association [21]. For the two feature vectors U and V, the adjusted MI is calculated as follows:
where MI (U, V) is the mutual information between vectors U and V and can be calculated as follows:
where N denotes the total number of dimensions of vectors U and V (i.e., the number of samples), and and denote the number of samples in clusters and , respectively.
Entropy focused on the feature classes was used to measure the amount of information included in the SC type for the combination of feature classes. A random variable in entropy is defined as a feature class, and the probability by feature class is defined as the ratio of the sum of the adjusted MI between a target feature (i.e., SC type) and other features to the sum of the adjusted MI between target features and features belonging to the feature class. To assess how uncommon the derived value is, we calculated the entropy values for all features in the same manner and compared the relative ranks of the entropy values for the target features. This ranking was used to identify the combination of feature class information mainly included in the SC type. In other words, the combination of feature classes yielding a high relative ranking is the information mainly included in the feature of interest. AMI and entropy focused on the feature classes were calculated using scikit-learn 0.22.1.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.