Quantification and statistical analysis

Emilio Kropff; James E. Carmichael; Edvard I. Moser; May-Britt Moser

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Quantification and statistical analysis

EK Emilio Kropff

JC James E. Carmichael

EM Edvard I. Moser

MM May-Britt Moser

This method is extracted from research article: Neuron, Mar 2021

Frequency of theta rhythm is controlled by acceleration, but not speed, in running rats

DOI: 10.1016/j.neuron.2021.01.017

Request a Protocol

Ask a question

Favorite

One channel of every microdrive (with 4 tetrodes) was used for LFP recordings. Sessions where theta oscillations had low amplitude were rejected so as to optimize the signal-to-noise ratio in our analyses, allowing for a correct detection of instantaneous theta phase and frequency. Sessions were rejected if the power spectrum at 8 Hz (typically the theta peak) was less than twice the power spectrum at 6 Hz or 12 Hz (typically the throughs in LFP around the theta peak) for the same session. In recordings where both entorhinal drives had LFPs that passed the criterion, only the one with the highest ratio of P₈ / max(P₆,P₁₂) was used, where P_f represents the power spectrum at a frequency of f Hz.

Spike sorting was performed offline using graphical cluster-cutting software (Tint; Axona Ltd). Clustering was performed manually in two-dimensional projections of the multidimensional parameter space (consisting of waveform amplitudes), using autocorrelation and cross-correlation functions as additional separation tools and separation criteria. In general, the stability of the tetrodes allowed for all sessions in a day to be merged for clustering purposes.

Theta phase was obtained from the Hilbert transform of the band-pass filtered LFP signal (between 6 and 12 Hz) and unwrapped by adding or subtracting 360° every time a jump in phase with absolute value higher than 180° occurred between two consecutive samples. Note that this procedure filtered out high frequencies, such as harmonics of theta. For this reason, potential variations in the shape of the theta cycle, caused by variations in the relative weight of different harmonics, did not affect our estimation of instantaneous theta frequency. In the bottomless car experiments, where exact repetitions of trajectory allowed for averaging over multiple trials, the frequency at time t was estimated from the average in phase difference between t + 60 ms and t – 60 ms. In a similar way, speed was estimated as the difference in position between t + 100 ms and t – 100 ms, while acceleration was estimated from the difference in speed between t + 100 ms and t – 100 ms, which is equivalent to the position at time t + 200 ms plus the position at time t – 200 ms minus twice the position at time t. Both smoothing windows (60 ms and 100 ms) were the smallest necessary to make the average signal visually clear in terms of signal to noise (Figure 1B).

In free foraging experiments, stronger smoothing along the temporal dimension was required in order to rely on single-trial estimations of frequency, speed and acceleration. Instantaneous theta frequency was computed by applying a Kalman filter and smoother (RTS) (MATLAB toolbox https://www.cs.ubc.ca/∼murphyk/Software/Kalman/kalman.html) to the phase obtained through the Hilbert transform (as in the previous case). The hidden state vector was composed by theta phase and its derivative, instantaneous theta frequency, while the observation vector included only theta phase. The system matrix was set to $F = | \begin{array}{l} 1 & 1 \\ 0 & 1 \end{array} |$ and the observation matrix to $H = | \begin{array}{l} 1 & 0 \end{array} |$ . The parameters controlling the amount of smoothing were $Q = | \begin{array}{l} 1 & 0 \\ 0 & 1 \end{array} |$ (system covariance) and $R = 5$ (observation covariance). An evaluation of this smoothing scheme can be found in Figure S5.

In 2-dimensional free foraging experiments, the x and y components of the velocity (v_x, v_y) and acceleration (a_x, a_y) vectors were computed from the tracked positions adapting the same Kalman filter. The hidden state vector was composed by position and its derivatives up to second order X_hidd = [x, v_x, a_x, y, v_y, a_y], while the observation vector included only the positions tracked by LEDs, X_obs = [x,y]. The system matrix was set to $F = | \begin{array}{l} A & Z_{3,3} \\ Z_{3,3} & A \end{array} |$ , where Z_3,3 denotes a 3 × 3 matrix filled with zeros and $A = | \begin{array}{l} 1 & 1 & 0.5 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{array} |$ . The observation matrix was $H = | \begin{array}{l} 1 0 0 0 0 0 \\ 0 0 0 1 0 0 \end{array} |$ . While these matrixes define the kinematics of any system with speed and acceleration in 2-dimensions, the crucial smoothing parameters were $Q$ (system covariance), a matrix filled with zeros and diagonal elements [50, 0.02, 0.01, 50, 0.2, 0.01], and $R = | \begin{array}{l} 100 & 0 \\ 0 & 100 \end{array} |$ (observation covariance). An evaluation of this smoothing scheme can be found in Figure S5. The same Kalman filter parameters were used in a previous publication (Kropff et al., 2015).

In addition, episodes of resting, grooming or other non-navigational behaviors were filtered out. To do so, instantaneous traveled distance at time t was defined as the distance between the positions of the animal at time t+0.5 s and t-0.5 s. We only considered data points where the instantaneous traveled distance was consistently higher than 5 cm in a window of 0.5 s around timestamp t.

Due to destructive interference, windows of LFP oscillations cut around an event such as acceleration or deceleration onset cannot be directly averaged. To obtain a reliable estimation of temporal evolution of the LFP averaged across multiple trials while avoiding destructive interference, the following procedure was used. Raw LFP traces were first filtered using a 5 Hz high-pass filter to remove slow oscillation effects associated to acceleration (data not shown). Windows of 1 s were then cut around acceleration or deceleration onsets. Each window contained 4800 samples and could be represented by a single point in a space with dimension N = 4800. The collection of all windows (points in a high dimensional space) corresponding to either acceleration or deceleration events were clustered using the k-means algorithm implemented in the MATLAB (MathWorks, MA) function kmeans(). To have a good coverage of all theta phases, the number of clusters was set to k = 20. Cluster centroids (i.e., averages of windows with similar data) representative of at least 20 data windows were used for visualization of mean theta oscillation behavior in response to acceleration or deceleration (Figure 1C).

To estimate intrinsic firing frequency in small windows of data, a parametric model of the lag between spikes was used (Climer et al., 2015). This method has been shown to be more robust than others based on spike-time autocorrelograms when the overall number of spikes is very low. For every cell, the 4 ms bin histogram of spike lags (pooling together lags between consecutive and non-consecutive spikes) was obtained. If properly normalized, the spike-lag histogram represents an estimation of probability of a cell to spike at time t given that it spiked at time 0. To analyze the effect of different degrees of speed or acceleration, only spikes passing a certain criterion, such as occurring at a given range of speeds or within a window of time after some event, were used to compute the spike-lag histogram. Next, spike-lag histograms for all cells in a given group were averaged to obtain the population spike-lag histogram of that group (Figure S2D).

The theta frequency associated with a population spike-lag histogram was estimated as parameter F of the equation

which was fitted to each histogram using the MATLAB function fit() over the range t = 16 to 300 ms. Note that the equation is versatile in the sense that it includes the possibility of a linear or an exponential decay of the non-oscillatory part of the histogram, covering a wide range of situations. To allow for visual comparisons across conditions or cell types, the population spike-lag histograms and the fitted curves were normalized by f(0) = A+C+E.

A similar procedure was used to estimate the intrinsic firing period of cells in terms of LFP theta phase (Figure S4). To do this, spike-lag histograms were constructed with differences in spike unwrapped theta phase rather than with differences in spike times, and 3° bins were used. Note that unwrapped theta phase is monotonically increasing, and thus analyses of these data use linear rather than circular statistics.

In some cases, when acceleration events were steep and short (Figures 2A and 2B), the size of the temporal windows from which spikes were included was critical. To get an estimate of the highest transient acceleration levels, windows had to be as short as possible (in the order of tens of milliseconds). However, the method needs substantial numbers of spikes, which would not be available using too short windows, as well as at least one full theta period. We used 500 ms as a compromise between including as many spikes as possible on one side and narrowing the window to capture only spikes that happened when acceleration was high on the other. Given that acceleration episodes could last less than 500 ms (e.g., Figures 1D and 1E use windows of 200 ms), the intrinsic firing frequencies that we report for isolated positive acceleration events are underestimations.

Chance-level statistics was constructed for a given variable W through a shuffling procedure. For each scenario described below, data were shuffled N times (N specified in the main text) and a surrogate value of W was calculated on each of the shuffling instances. The collection of N surrogate W values constituted the chance level distribution of W. A direct comparison between the actual value of W and its chance level distribution was used to obtain the associated p value.

Every run had a single acceleration or deceleration window. In every shuffling step, this window was displaced to fall randomly either inside the fast or slow constant running speed periods of the same run, avoiding resting periods. In a single shuffling step the displacement was different for different runs. The shuffled distribution was constructed with equal contributions of slow and fast running windows (Figures 2A and 2B).

To find the baseline distribution of intrinsic firing frequency for various populations of cells and levels of speed or acceleration, the windows of selected data were displaced in time by a random interval not shorter than 20 s and not longer than the total duration T of the session where the cell was recorded minus 20 s. For resulting times greater than T, modulo T was applied. In this way, windows randomly fell in any part of the session avoiding 20 s to each side of their actual temporal location (Figures 3D–3F). In a single shuffling step, different cells were displaced by a different interval, related to each cell’s T. For plots in Figure 3, distributions of shuffles corresponding to positive and negative acceleration windows were pooled together for visualization purposes, but analyses and p values were obtained considering only the corresponding shuffled distribution.

Segments of constant running speed where classified by their speed group (7, 14, 21 or 28 cm s^-1). Modulation was assessed by correlating intrinsic theta frequency and speed group across segments. Each shuffling step consisted of randomly rearranging the speed-group tags (Figure S3).

The spikes in each run were displaced in time by an interval smaller than the duration of the run T, and a modulus T was applied for spikes with timestamps larger than T, thus avoiding resting periods. Although T was roughly the same for all runs, the displacement was pseudo-randomly selected for each run in an independent way. Maps, correlations and other measures were computed with the new timestamps for each run.

Episodes of resting, grooming or other non-navigational behaviors were removed by defining the instantaneous traveled distance at time t as the distance between the positions of the animal at time t+0.5 s and t-0.5 s, considering for our analyses data points where the instantaneous traveled distance was consistently higher than 5 cm for half a second around the timestamp t.

Color maps in Figure 3A where obtained from a histogram of theta frequency as a function of the 2D acceleration vector, which was represented in polar coordinates through its magnitude a and its direction relative to the direction of velocity θav. Velocity was represented as always pointing to the north, so that θav = 0, when acceleration and velocity point in the same direction (positive acceleration), was represented in the north part of the plot. The width of angular bins was 3.6° and that of acceleration bins was 20 cm s^-2. Since data were distributed in a large number of bins with low coverage in each one, especially for high acceleration, color maps were smoothed using a Gaussian filter with a standard deviation of 30 cm s^-2 and10°.

For analyses of positive and negative open field acceleration, such as maps in Figure 3B, it only makes sense to consider episodes when cos(θav) is close to either 1 (positive acceleration; North) or −1 (negative acceleration; South). We thus defined linear acceleration in the open field as a^∗sign(cos(θav)) only for timestamps where |cos(θav)| > 0.9 (black dashed lines in Figure 3A). Plots of frequency versus acceleration (e.g., Figure 3B) included only data where the sign of acceleration was properly defined, and no smoothing was applied in the plot. All analyses, such as those in Figures 3D–3F, also included only timestamps where the sign of acceleration was properly defined.

Descriptive scores for each cell type and shuffling procedures to obtain 99^th percentile classification thresholds are identical to those used in Kropff et al. (2015). We here summarize the procedures to obtain the scores.

A grid score was determined for each cell from a series of expanding circular samples of the cell's spatial autocorrelogram (2.5 cm bins), each centered on the central peak but with the central peak excluded (Sargolini et al., 2006; Stensola et al., 2012). The radius of the central peak was defined as either the first local minimum in a curve showing correlation as a function of average distance from the center, or as the first incidence where the correlation was under 0.2, whichever occurred first. The radius of the successive circular samples was increased in steps of 1 bin (2.5 cm) from a minimum of 10 cm more than the radius of the central peak, to a maximum of 90 cm. For each sample, we calculated the Pearson correlation of the ring with its rotation in α degrees first for angles of 60° and 120° and then for angles of 30°, 90° and 150°. We then defined the minimum difference between any of the elements in the first group (60° and 120°) and any of the elements in the second (30°, 90° and 150°). The cell’s grid score was defined as the highest minimum difference between group-1 and group-2 rotations in the entire set of successive circular samples.

The head-direction score or mean vector length of a cell's circular firing rate distribution was computed as the absolute value of its head-direction tuning map’s angular mean |∑ h_k exp(-i θ_k)|, where θ_k is the k^th bin’s angle and h_k the corresponding firing rate normalized by the overall mean firing rate (Boccara et al., 2010).

A border score was computed for each cell as the difference between the maximal length of a wall touching on any single firing field of the cell and the average distance of the field from the nearest wall, divided by the sum of those values (Solstad et al., 2008). The range of border scores was thus −1 to 1. Firing fields were defined as collections of neighboring pixels with firing rates higher than 20% of the cell’s peak firing rate and a size of at least 200 cm².

The cell's speed score was computed as the Pearson correlation between instantaneous firing rate and instantaneous running speed, excluding segments of running at a speed < 2 cm s^-1 (Kropff et al., 2015).

Spatial information rate (Skaggs et al., 1996) for a cell was computed as ∑ p_k λ_k log₂(λ_k), where p_k is the k^th spatial bin’s occupancy probability and λ_k the corresponding firing rate normalized by the overall mean firing.

A cell fell under the category of putative fast spiking if its mean firing rate was above 10 Hz and its spike width below 0.3 ms (Buetfering et al., 2014; Kropff et al., 2015).

The theta frequency model in Figure 5B included the two key ingredients of rectification and slow decay. Rectified acceleration was defined as a_R = a for a > 0 and a_R = 0 otherwise, where a is modeled or tracked acceleration. Slow decaying rectified acceleration a_S was defined for the first timestamp t₁ as a_R(t₁). For any other time bin t_i, it was defined as the largest value between a_R(t_i) and exp(- 0.1)^∗a_R(t_i-1), which given the sampling rate of 50 Hz introduced an effective exponential decay with a constant of 0.2 s, only for episodes where acceleration was negative or decayed faster than that. Finally, modeled theta frequency for any timestamp t_i was defined as θ(t_i) = 8 Hz + a_S(t_i) / 100, with an increase of 1 Hz for every 100 cm s^-2 of positive acceleration. Note that we do not claim this model to be the best possible fit to our data but we utilize it for illustrative purposes given its simplicity.

The curve describing theta frequency as a function of time was filtered with low-pass, high-pass or wavelet filters, to capture different dynamical regimes in theta frequency temporal variations. Whenever the zero frequency component (i.e., the mean theta frequency) was filtered out (high-pass or wavelet filters), it was added back to the resulting signal to facilitate visual comparisons. This is why the instantaneous components in Figures 6A and 6E vary around 8 Hz rather than around 0 Hz.

Spatial information of grid maps was assessed in the same way as for place maps (see “cell type classification” section). Stability was computed as the correlation between the spatial maps obtained in the first and second half of the session for each cell. For decoding, all grid cells recorded in rat 14740 in either right or left runs were used, pooling together even if recorded in different days (the reproducible spatial trajectory allowed for this). Population activity vectors for the first and the second halves of each session were constructed for all 2.5 cm bins along the track. Those obtained from the first half of each session were used as templates of expected population firing rate for each bin, while those obtained from the second half were used to decode position at each bin by choosing the template that maximized Pearson correlation with the bin’s population vector. Decoding errors were defined as decoded positions that were further away than 25 cm from the actual bin.

Statistical tests were in general non-parametric, except where indicated, and whether they were one or two-sided is indicated in each case. In many cases the baseline statistics was assessed through shuffling, as described in the corresponding section of the STAR methods.

Theta frequency is here assumed to depend on acceleration alone. The aim is to understand the emergence of a spurious correlation with speed, dependent on the form of the relationship between theta frequency and acceleration. The Pearson product-moment coefficient is here used to assess correlation between any two variables x and y with corresponding standard deviations σ_x and σ_y. It can be thought of as

where the covariance CV(x,y) is

E() denoting the expected value. Since the standard deviations are always finite and positive, the sign of the correlation is given by that of the covariance.

The covariance between speed (v) and acceleration (a) over a temporal window of size T can be obtained by replacing expected values by temporal averages,

where $\overline{v} = \frac{1}{T} \int_{0}^{T} v dt$ is the mean speed. The integrals can be solved using the fact that the acceleration is the derivative of speed,

If initial and final speed are the same, the covariance is strictly zero. Alternatively, since there is a limit for the speed of a rat, both terms (the difference and the squared difference between initial and final speed) are bounded, and vanish for large enough T. Hence, if the window under consideration is large enough, the covariance (and thus the correlation) between speed and acceleration are zero.

If theta frequency F is modeled as a linear function of acceleration

then $E (F v) = A E (v) + B E (a v)$ and $E (F) E (v) = A E (v) + B E (a) E (v)$ . The covariance between theta frequency and speed would be

which, as explained before, vanishes for large enough T. Hence, if theta frequency behaved as a linear function of acceleration, its correlation with speed would be zero.

Let a⁺ be the positive-only or rectified acceleration, given by

With this definition,

where k is an index that runs through all segments of data with positive acceleration. Each segment starts at a speed trough with value v(t^I_k)and ends at the next consecutive speed peak with value v(t^F_k) > v(t^I_k). Note that since the contribution of each segment is strictly positive and since the number of segments scales in the general case with T, the fact that T is large is no longer enough to make E(a⁺v) vanish, so this term will be significantly positive. Similarly,

is also positive and cannot be presumed to vanish with large T.

Merging Equation 1 and Equation 2,

The first factor in each term of the sum over k in the right hand side is the (positive) peak depth, while the second factor can be positive or negative depending on whether the half-height of each speed peak is higher or lower than the average speed. Thus, the sign of the covariance between rectified acceleration and speed is in principle different from zero and can be either positive or negative. Note, however, that in typical experiments rats spend most of their time running at low speeds, with very transient excursions into high speed regions. This situation favors low average speed and high peak half-height, resulting in an overall positive covariance.

If theta is modeled as a linear function of the rectified acceleration, the arguments of the previous subsection apply, and the sign of the correlation between theta frequency and speed will also be significantly positive.

In sum, the asymmetrical modulation of theta by positive, but not negative, acceleration is enough to create spurious correlations between speed and theta frequency that do not vanish with the size of the dataset. These non-vanishing correlations are positive rather than negative due to the preference of low speeds in typical rat behavior.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol