The pitch and formants of the vowels were measured using Praat (Boersma, 2001) based on median values for the middle third of each vowel. For the primary speaker, these measurements were as follows: f0 = 98 ± 1 Hz; [ɑ] F1 = 768 ± 7 Hz; [ɑ] F2 = 1137 ± 41 Hz; [i] F1 = 297 ± 16 Hz; [i] F2 = 2553 ± 33 Hz. For the oddball speaker, the measurements were: f0 = 115 ± 1 Hz; [ɑ] F1 = 756 ± 9 Hz; [ɑ] F2 = 1238 ± 153 Hz; [i] F1 = 327 ± 6 Hz; [i] F2 = 2123 ± 27 Hz.
Four “formant bands” were defined based on the formant peaks of the vowel stimuli (Figure 1B). In order to maximize signal to noise by including as many voxels as possible in formant-based ROIs, each band was defined to be as wide as possible without overlapping any adjacent bands. In cases where there was no relevant adjacent band, bands were defined to be symmetrical around their formant peaks. These calculations are described in detail in the following paragraphs.
The [i] F1 peak was 297 Hz. The adjacent peaks of relevance were f0 (peak = 98 Hz) and [ɑ] F1 (peak = 768 Hz). Therefore the lower bound of the [i] F1 band was defined as the logarithmic mean of 98 Hz and 297 Hz, which is 171 Hz, and the upper bound was defined as the logarithmic mean of 297 Hz and 768 Hz, which is 478 Hz. Logarithmic means were used to account for the non-linearity of frequency representation in the auditory system.
The [ɑ] F1 peak was 768 Hz. The adjacent formants were [i] F1 below and [ɑ] F2 (peak = 1137 Hz) above. The lower bound of the [ɑ] F1 band was defined as 478 Hz (the boundary with [i] F1 as just described), and the upper bound was defined as the logarithmic mean of 768 Hz and 1137 Hz, which is 934 Hz.
The [ɑ] F2 peak was 1137 Hz. The [ɑ] F1 formant was adjacent below, so the lower bound of the [ɑ] F2 band was defined as 934 Hz (as just described). There was no relevant formant immediately adjacent above, so the upper bound was set such that the [ɑ] F2 band would be symmetrical (on a logarithmic scale) around the peak, i.e. the upper bound was defined as 1383 Hz.
The [i] F2 peak was 2553 Hz. While no other first or second formants were adjacent above, the [ɑ] F3 formant (peak = 2719 Hz) was adjacent above, so the upper bound of the [i] F2 band was defined as the logarithmic mean of 2553 Hz and 2719 Hz, which is 2635 Hz. There was no relevant formant immediately adjacent below, so the lower bound for the [i] F2 band was set such that the band would be symmetrical (on a logarithmic scale) around its peak, i.e., the lower bound was set to 2474 Hz.
Note that while all four formant bands showed differential energy for the two vowels, the difference in energy was considerably greater for the two [ɑ] formant bands (Figure 1B). This was due in part to energy from [ɑ] f0 and F3 impinging on the [i] F1 and F2 bands respectively.
The four formant bands ([ɑ] F1, [i] F1, [ɑ] F2, [i] F2) were crossed with the four anatomical ROIs (Left HG, Right HG, Left STG, Right STG, based on the Desikan-Killiany atlas) to create sixteen ROIs for analysis. Each ROI was constructed by identifying all voxels within each anatomical region that were tonotopic as reflected in a statistic of F > 3.03 (p < 0.05, uncorrected) in the phase encoded Fourier analysis, and had a best frequency within one of the four formant bands. ROIs were required to include at least two voxels. Because tonotopic regions can be small and somewhat variable across individuals, not all participants had at least two voxels in each ROI. In these instances, data points for the ROI(s) in question were coded as missing, although data points for the participants’ other ROIs were included.
To investigate responses to the two vowels in the four formant bands crossed by the four ROIs, a mixed model was fit using lme4 (Bates et al., 2015) in R (R Core Team, 2018). There were five fixed effects, each with two levels. Two effects pertained to the anatomical region of interest: region (HG, STG) and hemisphere (left, right). Two effects pertained to the formant band: the formant number (i.e. was the formant band defined based on the first or second formant?) and “ROI-defining vowel” (i.e. was the formant band defined based on spectral peaks of [ɑ] or [i]?). The fifth effect will be referred to as “presented vowel”, i.e. to which vowel was the response estimated? All main effects and full factorial interactions were included in the model. Participant identity was modeled as a random effect, with unique intercepts fit for each participant. The dependent measure was estimated signal change () relative to rest, averaged across the three runs and all voxels in the ROI. The primary effect of interest was the interaction of ROI-defining vowel by presented vowel, which tests the main study hypothesis. Also of interest were all higher level interactions involving ROI-defining vowel and presented vowel, in order to determine whether any patterns observed were modulated by region, hemisphere, or formant number. P values were obtained by likelihood ratio tests comparing models with and without each effect in question, including all higher level interactions that did not involve the effect in question. Null distributions for the likelihood ratio test statistic 2(lF – lR), where lF is the log likelihood of the full model and lR is the log likelihood of the reduced model, were derived using a parametric bootstrap approach (Faraway, 2016). Our study was adequately powered to detect large effects: with 12 participants, power was ≥ 80% for contrasts with an effect size of d ≥ 0.89 (two-tailed).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.