2.3. Pattern Recognition Methods

Zhiming Guo; Chuang Guo; Quansheng Chen; Qin Ouyang; Jiyong Shi; Hesham R. El-Seedi; Xiaobo Zou

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.3. Pattern Recognition Methods

ZG Zhiming Guo

CG Chuang Guo

QC Quansheng Chen

QO Qin Ouyang

JS Jiyong Shi

HE Hesham R. El-Seedi

XZ Xiaobo Zou

This method is extracted from research article: Sensors (Basel), Apr 2020

Classification for Penicillium expansum Spoilage and Defect in Apples by Electronic Nose Combined with Chemometrics

DOI: 10.3390/s20072130

Request a Protocol

Ask a question

Favorite

Principal component analysis (PCA) is a common algorithm in dimension reduction, and it can also be regarded as a multivariate statistical analysis method to master the main contradictions of things [20]. It became the most commonly used feature extraction method. It is a method of data dimensionality reduction by using linear mapping, and at the same time, it can remove the correlation of data, to keep the variance information of the original data to the maximum extent. Therefore, PCA can be used to process the original data of electronic nose, eliminate noise and unimportant features, extract feature information, greatly simplify the difficulty and complexity of problem processing and improve the data processing speed and the anti-interference ability of the original data.

Linear discriminant analysis (LDA) is the best projection direction obtained by the method of finding the extreme value of Fisher criterion function so that after the sample is projected in this direction, it enables the largest inter-class dispersion and the smallest intra-class dispersion [21]. It is a classical algorithm for pattern recognition. The main idea is to project the classified samples onto a one-dimensional line to minimize the variance within the category and maximize the variance between categories [22]. Compared with PCA, LDA can increase the distribution and mutual distance within the same category, collect information from all data and improve the classification accuracy.

K-nearest neighbor (KNN) method is also called Reference Sample Plot Method. The KNN algorithm is a simple, effective and non-parametric classification method [23]. The core idea of KNN is simply to give a prediction target, then calculate the distance or similarity between the prediction target and all samples, then select the first K samples that are closest to each other and then use these samples to vote for decisions [24]. The basic steps are as follows: The training set and test set are constructed, and then the k value is set. The distances between the samples of the training set and the test set were arranged in descending order, and k samples with small distances were selected as the k nearest neighbors of the test samples to obtain the class with the largest number of k nearest neighbors.

Principal component analysis-discriminant analysis (PCA-DA) is a PCA-based analysis method [25]. To combine the two algorithms of PCA and LDA, principal component analysis was first performed on sample data to achieve the purpose of dimensionality reduction. At the same time, a corresponding model is established to classify unknown samples. The specific steps are as follows: map the sample data to the feature subspace with PCA algorithm, select the number of PCA principal components according to the contribution rate or other indicators and finally use LDA algorithm to carry out linear classification in the subspace. PCA-DA takes into account the class member information provided by the auxiliary matrix in the form of code when constructing factors, so it has an efficient discrimination ability, and at the same time improves the validity and validity of the model.

Partial least square discriminant analysis (PLS-DA) algorithm is a discriminant analysis method based on PLS regression. The basic idea is to establish a qualitative analysis model according to the characteristics of the known sample set [26]. It trains the characteristics of different processing samples (such as observation samples and control samples) respectively, generates training sets and tests the credibility of training sets [27]. In the process of analysis, the overlapping part of numerous chemical information can be eliminated to extract the data most relevant to the sample category, that is, to maximize the difference between the extracted data of different categories, making the analysis data more accurate and reliable and improving the accuracy of classification discrimination.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol