MFCCs are one of the popular and successful methods for obtaining features in voice analysis and automatic speech recognition systems [44]. MFCCs is a digital technical analysis that simulates the perception of human ears and is calculated on the basis of Fast Fourier Transform (FFT). Since the characteristics of speech signals remain stable in a very small time interval (about 20–30 ms), they are processed in very short time intervals [45, 46]. This short interval is called the frame. Frames are usually chosen to overlap to make transitions between frames smoother. Similar to the calculation of spectrogram, here, the windowing process takes place to avoid a discontinuity at the beginning and end of the frames. The commonly used window structure is Hamming. After windowing, FFT is applied to transform each frame from the time domain to the frequency domain. The mel unit is a unit designed to imitate the perceptual feature of the human ear. Conversion between the mel scale and the frequency scale is provided by the equation given below.
In this way, MFCCs are the expression of the short-time power spectrum of the sound signal on the mel scale [47, 48]. When MFCCs are calculated for a cough sound, a matrix is obtained in the M N matrix, where M is the number of MFCCs and N is the number of segments (the number of frames).
In the literature, MFCCs was used for the classification of cough sounds. For example, in [20], the features were extracted by the MFCCs method for the classification of dry and wet coughs. In order to obtain features with the MFCCs method, attention should be paid to important factors, called hyperparameters, which include the type of window used, frame length, frame overlap length, number of segments used for feature extraction, and number of MFCCs. In this study, the chosen window type was Hamming, and the frame overlap length was half of the frame length. The optimum values of the other three hyperparameters (the frame length, number of MFCCs, and number of segments used for feature extraction) were chosen using the LOO-CV strategy.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.