The selected vocal task was sustained phonation of /a/ vowel due to several advantages, such as its wide spread use in the scientific literature; simplicity to realize by the participants, which avoids fatiguing them, especially in the case of patients with more advanced PD stages; ease of analysis and control; ubiquity in different languages; and the fact that it is unaffected by phonetic context or intonation [12].
The recording task for UEX database consists of performing three 5-seconds voice phonations, pronouncing the /a/ vowel in a continuous and uninterrupted way holding pitch and loudness as constant as possible.
Due to the biological variability, voice recordings from a particular subject result in similar but not identical waveforms. The consequence is that the features are also not identical when extracted from different recordings from the same individual. To obtain more stable predictors, it was decided to record three utterances per subject so that the feature values can be later averaged to produce an only feature vector per subject.
All the voice recordings were made using the same smartphone (model BQ Aquaris V) at a sample frequency of 44.1 kHz. The recordings were taken at the facilities of the Regional Association for Parkinson’s Disease of Extremadura (Spain), always in the same room, that was relatively quiet but did not have any special acoustical isolation. A specialized person was present to ensure that all the participants properly followed the voice recording protocol and registered the complementary information based on medical reports.
Voice recordings from mPower were performed on participants’ iPhones (4th generation or a more advanced version) or iPods (5th generation or newer) by using the /a/ vowel phonation protocol. A sample frequency of 44.1 kHz was used. Since participants record themselves without supervision, this database includes a variety of acoustic environments. They were also responsible to fill in the form including the complementary information, which makes the obtained data somehow unreliable.
Before applying feature extraction, all the recordings from both databases were trimmed down to one second discarding any leading or trailing silence. This length has been considered sufficient to extract speech features from sustained vowel phonations by other authors [40]. Voice recordings were edited using Audacity software (release 2.0.5).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.