Consonance models

RM Raja Marjieh
PH Peter M. C. Harrison
HL Harin Lee
FD Fotini Deligiannaki
NJ Nori Jacoby
request Request a Protocol
ask Ask a question
Favorite

While many psychoacoustic accounts of consonance perception have been presented over the centuries, recent literature has converged on two main candidate explanations: (1) interference between partials, and (2) harmonicity (see ref. 20 for a review). We address both accounts in this paper using computational modeling, as described below.

According to interference accounts, consonance reflects interference between the partials in the chord’s frequency spectrum. The nature of this interference depends on the distance between the partials. Distant partials and very close partials elicit minimal interference; however, partials separated by a moderately small distance (of the order of a semitone) elicit a large amount of unpleasant interference. This interference is most commonly thought to derive from fast amplitude fluctuation (beating26) but may also reflect masking8,82.

The literature contains many interference-based consonance models (see ref. 20 for a review). These models vary in their mechanistic complexity, but interestingly the older and simpler models seem to perform better on current empirical data20. Here we use a collection of so-called pure-dyad interference models31,41,43 which calculate interference by summing contributions from all pairs of partials in the acoustic spectrum, where each pairwise contribution is calculated as an idealized function of the partial amplitudes and the frequency distance between the partials. We focus particularly on the model of Hutchinson and Knopoff41, which performed the best of all 21 models evaluated in Harrison and Pearce20, but we also explore the other models in the Supplementary Information. We avoided testing more complex waveform-based models (e.g. refs. 83,84) because of their high computational demands and relatively low predictive performance20.

At the core of the Hutchinson-Knopoff model is a dissonance curve that specifies the relative interference between two partials as a function of their frequency distance, expressed in units of critical bandwidths (Supplementary Fig. 11). This relative interference is converted into absolute interference by multiplying with the product of the amplitudes of the two partials. The main differences between the Hutchison-Knopoff, Sethares, and Vassilakis models correspond to the precise shapes of the dissonance curves and the precise nature of the amplitude weighting.

The original presentation of the Hutchinson-Knopoff model defined the dissonance curve solely in graphical form. Here we use a parametric approximation of this curve introduced by Bigand et al.50: D(x)=(4xexp(14x))2, where D(x) is the dissonance contribution of a pair of partials separated by critical bandwidth distance x (Supplementary Fig. 11, top panel). They model critical bandwidth distance (x) as a function of the frequencies of the two partials f1,f2:

Our revised model (as included in the composite model plots) includes several additional changes motivated by the results of our experiments. These changes involve new model parameters (p, q, and r) which were optimized numerically following the steps described in Parameter optimization (see below).

First, the dissonance curve D(x) is revised to incorporate a preference for slow beats (Supplementary Fig. 11, bottom panel):

where p=0.096 is the slow-beat boundary (the distance at which the pleasantness of slow beats starts contributing) and q=1.632 is the slow-beat pleasantness (the strength of the slow-beat pleasantness effect).

The original Hutchinson-Knopoff model sums together dissonance contributions from all pairs of partials, weighting each contribution by the product of those partials’ amplitudes. The consequence is that dissonance is proportional to sound intensity.

In our revised model, we add an additional parameter r that nuances this amplitude weighting. Setting r=2 recovers the intensity-weighting of the original model. Setting r=1 makes dissonance proportional to amplitude rather than intensity. The optimized value of r=1.359 sits somewhere between these two interpretations:

According to harmonicity accounts, consonance is grounded in the mechanisms of pitch perception. Pitch perception involves combining multiple related spectral components into a unitary perceptual image, a process thought to be accomplished either by template-matching in the spectral domain or autocorrelation in the temporal domain (see ref. 85 for a review). Consonance perception can then be modeled in terms of how well a particular chord supports these pitch perception processes. Here we test three such models: two based on template-matching42,45 and one based on autocorrelation, after Boersma44 (see below for details). We focus particularly on the model of Harrison and Pearce42 because of its high performance in Harrison and Pearce20, but we also explore the other models in the Supplementary Information. We excluded several other candidate models because they are insensitive to spectral manipulations, the main focus of this paper5,17,86.

Following Milne45, the Harrison-Pearce model uses a harmonic template corresponding to an idealized harmonic complex tone. The template is expressed in the pitch-class domain, a form of pitch notation where pitches separated by integer numbers of octaves are labeled with the same pitch class. It can be transposed to represent different candidate pitches; Supplementary Fig. 16a shows templates for C, D, E, and F.

Each input chord is likewise expressed as an idealized spectrum in the pitch-class domain, after Milne45 (Supplementary Fig. 16b). This involves expanding each chord tone into its implied harmonics, making sure to capture any available information about the strength of the harmonics (e.g., spectral roll-off) and their location (e.g., stretched versus non-stretched).

A profile of virtual pitch strength is then created by calculating the cosine similarity between the chord’s spectrum and different transpositions of the harmonic template, after Milne45 (Supplementary Fig. 16b). For example, the virtual pitch strength at 2 corresponds to the cosine similarity between the chord’s spectrum and a harmonic template with a pitch class of 2 (i.e., a D pitch template).

Finally, harmonicity is estimated as a summary statistic of the virtual pitch strength profile. The Harrison-Pearce model treats this profile as a probability distribution, and computes the information-theoretic uncertainty of this distribution, equivalent to the Kullback-Leibler divergence to this distribution from a uniform distribution; high uncertainty means an unclear pitch and hence low harmonicity. Milne’s45 model takes the same approach, but instead returns the height of the highest peak of this distribution.

The autocorrelation model uses the fundamental-frequency estimator of Boersma44, as implemented in the Praat software and accessed via the Parselmouth package87. The algorithm works by looking for the maximum of the sound’s autocorrelation function (i.e., the temporal interval at which the sound correlates maximally with itself).

The following steps were used to estimate the harmonicity of a given chord:

Our composite model combines both interference41 and harmonicity42 models, including the modifications described above. The two models are combined additively with a weight of −1 for the interference model and +0.837 for the harmonicity model (see Parameter optimization for details). When applying the composite model we also median-normalize the outputs within each experiment (i.e., subtracting the median value across the whole experiment from each model output), reflecting the way in which participants calibrate their scale usage to the stimuli within each experiment. Note that we plot the final version of the composite model throughout the paper, rather than plotting incremental versions as motivated by each experiment, to ensure that later model changes do not spoil predictions for earlier experiments.

Our new models include several key parameters, listed in Supplementary Table 3. We set initial versions of these parameters based on a combination of theory and manual exploration of the data; we then numerically optimized these parameters using a gradient-free optimizer as described below.

Optimizing model parameters requires defining an objective function that is not too time-consuming to compute. We therefore excluded the triad experiments from the objective function, and solely modeled data from the dyad experiments. For each dyad experiment, we computed consonance profiles for models and participants as described earlier. We found that correlation metrics were a poor measure of qualitative fit between models and participants, so we instead measured fit by comparing the peaks between the model and the participant profiles. Participant profile peaks were computed as previously; model profile peaks were obtained analogously, but instead of relying on bootstrapping to remove unreliable peaks (the model outputs are deterministic and hence can’t be bootstrapped) we filtered excess peaks by increasing the peak-finding minimum depth parameter (β) from 0.01 to 0.05. We then defined model fit by looking at the overlap between participant peaks and model peaks. In particular, we characterized a pair of participant and model peaks as overlapping if they were separated by less than 2.67% of the overall interval range in that study (i.e., 0.4 semitones in the 15-semitone experiments, or 0.013 semitones in the 0.5-semitone experiments), and then computed the Jaccard similarity between the sets of participant and model peaks. The overall model fit was then calculated as the mean Jaccard similarity across all the dyad experiments.

We optimized this objective function as a function of the parameters listed in Supplementary Table 3 using the subplex algorithm88 as implemented in the NLopt package (http://github.com/stevengj/nlopt), and using the parameter bounds specified in Supplementary Table 3. The model converged after 245 iterations to a mean Jaccard similarity of 0.467. The resulting model parameters (Supplementary Table 3) are used for all visualizations and analyses involving the composite model.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A