Measuring optimism at choice

TM Timothy H. Muller
JB James L. Butler
SV Sebastijan Veselic
BM Bruno Miranda
JW Joni D. Wallis
PD Peter Dayan
TB Timothy E. J. Behrens
ZK Zeb Kurth-Nelson
SK Steven W. Kennerley
request Request a Protocol
ask Ask a question
Favorite

We indexed the nonlinearity in the firing rate as a function of reward using a measure analogous to that used in Dabney et al.15. We measured the ‘reversal point’ of a neuron by estimating the value at which that neuron’s response is the same as (or reverses from positive to negative deviation from) the mean firing rate across trials after the presentation of the value-predicting cue (in the analysis window).

Unlike in dopaminergic neurons, the reversal point here is induced by z-scoring the data (mean firing rate in the analysis window after stimulus onset) within a neuron and across trials, and is therefore not exactly the same as the reversal point from baseline (pre-stimulus onset) firing, as used in Dabney et al.15. This is necessary because deviation in the firing rate from baseline in cortical neurons does not have the same assumed meaning as it does in dopaminergic neurons. In dopaminergic neurons, it is assumed that positive and negative deviations from baseline firing rate equate to positive and negative RPEs being signaled by that neuron15,16. However, in the cortex, many probability selective neurons will, for example, increase their firing rate (relative to the pre-cue baseline) in response to all values (that is, even those at the lowest part of the reward distribution, which ought to elicit negative RPEs even in the most pessimistic neurons). Hence, unlike in dopaminergic neurons, in the cortex an increase in the firing rate relative to the baseline does not necessarily mean a positive RPE (Extended Data Fig. Fig.2).2). We therefore measured the reversal point for each neuron by z-scoring the data in a window after feedback, so that we could compare the measures of optimism across neurons (this z-scoring results in neutral neurons having a reversal point of 2.5 and deviations >2.5 and <2.5 indicating optimism and pessimism, respectively). The reversal point is estimated by linearly interpolating between the neighboring negative and positive state values and is defined as the value at which that interpolation crosses no change from the mean firing rate (Fig. (Fig.1b).1b). If a neuron is optimistic and thus predicts the highest values in the range of the task, the firing rate to all values but the highest value will be low relative to that of the highest, hence the reversal point will be high (Fig. (Fig.1b,1b, left). We used this reversal point measure for consistency with Dabney et al.15.

However, we noted that an alternative measure of optimism capturing the nonlinear shape of the neuronal response as a function of reward yielded qualitatively the same results (Extended Data Fig. Fig.3)3) and was highly correlated with the reversal point. This measure is obtained by fitting the nonlinearity in the firing rate as a function of reward using a quadratic term in linear regression:

where FR is the firing rate on each trial and R the reward level. β2 is a regression weight that indexes optimism via the concavity (or convexity) of the function. As expected, this measure of optimism is highly correlated with the reversal point described above (R = 0.87, P = 4.0 × 10−37 by Pearson’s correlation), corroborating that both measures index the nonlinearity in the firing rate as a function of reward.

Such nonlinear responses have recently been shown to arise from normalized RL, wherein rewards are represented by a normalized objective function inspired by a canonical divisive normalization computation27. Such normalization may be particularly relevant to cortical neurons. Importantly, it also offers a mechanism for how nonlinear reward coding compatible with distributional RL may arise in a biologically plausible manner and, furthermore, how this may naturally give rise to distributional RL27. This work therefore provides a deeper possible explanation and mechanism for how the effect captured by our reversal point and quadratic β measures may arise and result in distributional coding.

In terms of what reversal points we expect to see in our data, we noted that, although the probability distribution over value was uniform (each of the four value levels was equally likely to be presented at choice on a given trial), this did not necessarily mean that we expected the measured reversal points to be a uniform distribution. This is because the learned reversal points arising from distributional RL are predicted to correspond to expectiles of the reward distribution (Dabney et al.15). Therefore, we did not expect the measured reversal points (in Fig. Fig.1c)1c) to be uniform; we did, however, expect them to exhibit consistent diversity (as shown in Fig. 1d,e).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A