Mel-frequency cepstral coefficients (MFCCs)

Mesut Melek

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Mel-frequency cepstral coefficients (MFCCs)

MM Mesut Melek

This method is extracted from research article: Neural Comput Appl, Jul 2021

Diagnosis of COVID-19 and non-COVID-19 patients by classifying only a single cough sound

DOI: 10.1007/s00521-021-06346-3

Ask a question

Favorite

MFCCs are one of the popular and successful methods for obtaining features in voice analysis and automatic speech recognition systems [44]. MFCCs is a digital technical analysis that simulates the perception of human ears and is calculated on the basis of Fast Fourier Transform (FFT). Since the characteristics of speech signals remain stable in a very small time interval (about 20–30 ms), they are processed in very short time intervals [45, 46]. This short interval is called the frame. Frames are usually chosen to overlap to make transitions between frames smoother. Similar to the calculation of spectrogram, here, the windowing process takes place to avoid a discontinuity at the beginning and end of the frames. The commonly used window structure is Hamming. After windowing, FFT is applied to transform each frame from the time domain to the frequency domain. The mel unit is a unit designed to imitate the perceptual feature of the human ear. Conversion between the mel scale and the frequency scale is provided by the equation given below.

In this way, MFCCs are the expression of the short-time power spectrum of the sound signal on the mel scale [47, 48]. When MFCCs are calculated for a cough sound, a matrix is obtained in the M $\times$ N matrix, where M is the number of MFCCs and N is the number of segments (the number of frames).

In the literature, MFCCs was used for the classification of cough sounds. For example, in [20], the features were extracted by the MFCCs method for the classification of dry and wet coughs. In order to obtain features with the MFCCs method, attention should be paid to important factors, called hyperparameters, which include the type of window used, frame length, frame overlap length, number of segments used for feature extraction, and number of MFCCs. In this study, the chosen window type was Hamming, and the frame overlap length was half of the frame length. The optimum values of the other three hyperparameters (the frame length, number of MFCCs, and number of segments used for feature extraction) were chosen using the LOO-CV strategy.

This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol