3.2. Feature Dimensionality Reduction

LZ Liang Zheng
JZ Jie Zhao
FD Fangjie Dong
ZH Zhiyong Huang
DZ Daidi Zhong
request Request a Protocol
ask Ask a question
Favorite

Dimensionality reduction aims to reduce the dimension of the feature space. The growth and update speed of datasets is accelerating, and the data are developing towards higher levels of high-dimensionality and unstructuredness [19]. Effective information is submerged in complex data, and the essential characteristics of the data are difficult to discover. How to achieve a low loss in the process of feature dimensionality reduction, maintain the properties of the original data, and obtain the optimal low-dimensional data has become an important goal of this paper.

After the feature extraction, the feature space dimension is increased to 44 dimensions. In order to reduce the computational pressure of the processor, the dimension of the feature space must be reduced. There are two ways of obtaining a reduction in the feature dimensionality: feature extraction and feature selection. The former is to map the feature space to a space with smaller dimensions and more independent features, and its representative method is principal component analysis (PCA). The latter is to select a part of the features from the original feature space to form a new feature space. Its characteristic is that the feature itself does not change and still retains its original meaning, while PCA loses its original meaning after the reduction in the dimensionality.

In the feature selection method, how to measure the value of the feature becomes the primary problem of the feature selection. Feature importance is an indicator to measure the contribution of features to the prediction results, which can be used as a reference for the screening features. For feature importance, different algorithms have different calculation methods. XGBoost uses weight as the feature importance by default. There are three different indicators. The weight uses the number of times the feature splits nodes as an indicator. The gain uses the average gain of this feature split as an indicator. The cover uses the average coverage of this feature split as an indicator. The quotient of the score of each feature and the total feature score is the output as the feature importance:

Based on the feature importance, this paper designs a method to select the features. It aims to remove features that contribute less to a classification and improve the feature importance of all the features in the feature space. The main idea is to delete features through multiple iterations and continuously optimize the feature space to obtain the optimal feature space. The overall process is shown in Figure 5.

The process of dimensionality reduction.

A lower threshold L is involved in the dimensionality reduction process, and there is no uniform standard for its value. It depends on the actual application scenario and is also closely related to the overall design of the system. This value is used to control the number of calculations in a single round of pruning, and to provide the theoretical minimum value of the feature dimension. In this paper, L is set to 10 to find the optimal feature space in a large range.

PCA is one of the most widely used feature dimensionality reduction methods, which aims to reduce the dimensionality of a dataset while preserving as much variability as possible [20,21]. In PCA, the features after the reduction in the dimensionality need to reflect the information contained in the original data as much as possible, and these features should be as independent as possible. Its specific approach is to map high-dimensional data into a low-dimensional space and replace the initial features with a smaller number of features. PCA extracts the most valuable information based on variance and ensures a linear independence between the features after the dimension reduction. Without the intervention of subjective parameters, although PCA is convenient for a general implementation, it also has the disadvantage that it cannot be personalized. At the same time, the use of PCA for dimensionality reduction needs to ensure that the data obeys the Gaussian distribution, otherwise the obtained main features may not be optimal.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A