In some simulations, we also modeled input to the hidden layer units of each st-RNN module from a GI layer (10 × 8 = 80 neurons; Fig. 4A). The GI layer received strong convergent input from the input layer units, and its output was obtained by topographic spatial convolution of the input layer activity with a box-filter of size 11 × 11, followed by nearest-neighbor downsampling to 10 × 8 resolution, and binarization by rounding. These input weights to the GI layer were not trainable (fixed weights). The resultant 10 × 8 binary-map was treated as the output of GI layer. No recurrent connections occurred in the GI layer. The GI layer projected to the hidden layer neurons (recurrent units) of the st-RNN modules through inhibitory connection weights (Ug; Eq. 3, below). These weights were trainable and were randomly initialized before training. The hidden layer unit activations were, then modeled as:
where is the output of the GI units, and Ug is inhibitory connection matrix from the GI units to the hidden layer units of the st-RNNs. Note that only one st-RNN module was trained along with the GI layer. For modeling images with tiled st-RNNs, GI layer weights were replicated across all, tiled st-RNN modules. Simulations in Figures 4, ,7,7, ,88 were performed with this version of the network incorporating the GI layer.
Global inhibition enables change detection with natural images. A, Top, Schematic of change detection with a representative natural image (resolution: 1024 × 768), interspersed by blanks. Red rectangle: location of change (not part of the image). Bottom, 8 × 8 st-RNN modules tiled to represent the full resolution image (overlapping blue patches in both input and output map). st-RNN modules were tiled with 50% overlap along both horizontal and vertical directions, such that each 4 × 4 patch in the image (except for patches closest to the border) was processed by four different st-RNN modules. Orange outline: global inhibition (GI) layer, mimicking the architecture of the Imc connection (Fig. 1A, orange nucleus). Gray lines: convergent, topographic connections from input to the GI layer; orange circles: inhibitory connections from the GI layer to both E and I neurons in the hidden layer of each st-RNN module; dashed connections: recurrent excitatory connections from E neurons in the hidden layer to neurons in the GI layer (in blue) and recurrent inhibitory connections among neurons in the GI layer (in orange); these connections were implemented in one variant of the network incorporating the GI layer (Materials and Methods). B, Topmost row, Thresholded, binarized saliency map around the region of change (red box; see text for details). Second and third rows, The expected output (ground truth) of mnemonic coding (MC) and change detection (CD) st-RNNs, respectively. Fourth and fifth rows, Output of the trained MC and CD st-RNN models before incorporating the global inhibition layer (−GI). Sixth and seventh rows, Output of the trained MC and CD st-RNN models after incorporating the global inhibition layer (+GI). For all rows, the middle and right columns represent the output of the respective st-RNN during the blank (B) and change image (A*) epochs, respectively. Red outlines: location of change. C, Analysis of a toy-example with nine st-RNN modules tiled in a 3 × 3 square grid, with no overlap. Rows 1–3, Input to the st-RNN modules (1st row), and the expected ouputs of the MC st-RNN (2nd row), and CD st-RNN (3rd row). Rows 4–9, Outputs of the trained MC and CD st-RNN models before incorporating the global inhibition layer (–GI; 4th and 5th rows), after incorporating local (short-range) recurrent interactions (+Li; 6th and 7th rows), and after incorporating the global inhibition layer (+GI; 8th and 9th rows). Other conventions are the same as in panel B. Top row, Red box: on-off transition; blue box: off-on transition. D, Left, Projection of hidden layer activity for each st-RNN (panel C) into the mnemonic subspace, in the absence of global inhibition (−GI). Trajectories begin from the first blank, when the first image was maintained (blue shaded dots; t = 3–10), through a transition corresponding to the presentation of the change image (lines with superimposed arrowheads), followed by the second blank, when the change image was maintained (green to yellow shaded dots; t = 13–20). Insets, Input images for each st-RNN module from the toy example (panel C). The st-RNN failed to accomplish the on-off transition (“plus” shape to blank) successfully (dashed purple arrow). Right, Activity of five representative hidden layer units, each represented by a different color, of the mnemonic coding st-RNN corresponding to the middle pattern (“plus”) from panel C. In absence of global inhibition (–GI) unit activity failed to reset on presentation of the change image (A*). Dots: data for each bin; dashed lines: spline fits; gray shaded bar: time point (t = 11) corresponding to presentation of the change image. E, Same as in panel D but in the presence of global inhibition (+GI). Left, The st-RNN accomplished the on-off transition (“plus” shape to blank) successfully (solid purple arrow). Right, In the presence of global inhibition (+GI) unit activity “reset” on presentation of the change image (gray shading).
A gaze model for simulating eye movements with the st-RNN model. A, A representative sequence of stimuli in the change blindness experiment. Images, with a key change between them (red box indicates location of change), were alternated for 250 ms each, with intervening blank frames, also presented for 250 ms (“flicker” paradigm). In the laboratory experiment, participants were required to scan the image and detect the change within a fixed trial duration (60 s). B, Steps involved in simulating sequential fixations with the st-RNN model. Clockwise from top left, Following fixation (yellow dot), the image was foveally magnified with a CVR transform. Following this a bottom-up saliency map was computed (Itti et al., 1998), thresholded and binarized (for details, see Materials and Methods). The temporal sequence of binarized saliency maps was provided to the stacked st-RNN model to obtain the Change map (output of the change detection st-RNN; top right). The Saliency and Change maps were then fused along with an IOR map to obtain the final Priority map (bottom; Materials and Methods). This final map was converted into a probability density and used to sample the next fixation point.
st-RNN based gaze model mimics and predicts human strategies in a change blindness task. A, Comparison of gaze scan path for a representative human subject (top) versus a model's trial (bottom) on an example image from the change blindness experiment. Red box: location of change. B, Correlation between model (x-axis; average across n = 80 iterations) and human data (y-axis; average across n = 39 participants) for the number of fixations (left) and distance traveled (right) before fixating on the change region. Error bars: SEM. C, Same as in panel B, but for a model in which the priority map was computed after excluding the st-RNN output (see Materials and Methods). D, Distribution of four different saliency comparison metrics across 27 images for the fixation map predicted by the st-RNN model (x-axis) versus the map predicted by the Salicon algorithm (y-axis) (Huang et al., 2015). Clockwise from top: AUC (Borji), KL-divergence, CC and similarity. Diagonal line: line of equality (x = y). For all metrics, except for KL-divergence, a higher value implies better match with human fixations. Insets, Distribution of difference between the st-RNN and Salicon prediction for each metric. E, Same as in panel D but comparing fixation predictions of the st-RNN model (x-axis) against that of the Itti–Koch saliency prediction algorithm (y-axis). Other conventions are the same as in panel D.
We also trained a variant of the network in which the GI layer neurons received topographic excitatory input from the st-RNN hidden layer excitatory (E) neurons; these weights (matrix ; Eq. 4) were trainable. The GI layer comprised of 100 neurons, organized in a 10 × 10 grid and also contained all-to-all recurrent inhibitory connections (matrix ; Eq. 4). As before, GI units inhibited the st-RNN hidden layer units using weight matrix (Eq. 3). The GI layer unit activations were modeled as:
where represents the convergent input from the input layer (10 × 10), is a trainable weight matrix that transforms the input to GI layer dimensions. For modeling high-resolution images with tiled st-RNNs, GI layer unit weights were shared across all tiled st-RNN modules, with representing averaged hidden excitatory unit activations across all the tiled st-RNN modules. Simulations in Figures 5 and and66 were performed with this version of the network.
Model unit responses to static, dynamic and competing stimuli. A–D, Normalized mean activity of output units (n = 62,500) of the change detection st-RNN for static (A), moving (B), looming (C), and receding (D) stimuli. Mean activity was normalized by the maximum activation across all four stimulus classes. Insets, Input stimulus patterns for each, respective, simulation. Insets, First row, Input sequence corresponding to each stimulus type (A–D). Insets, Second and third rows, Mnemonic coding (second row) and change detection (third row) outputs corresponding to the respective stimulus type. E, Normalized mean activity of the change detection st-RNN output units averaged across the final seven frames (t = 3 to t = 10) for static (S), moving (M), looming (L), and receding (R) stimuli (see text for details). F, Same as in panels A–D except showing activity evoked by a slow-looming stimulus in the upper left quadrant when presented alone (“Single,” gray) or concurrently with a fast-looming stimulus (“Paired,” black). Inset, Paired input stimulus pattern. Other conventions are the same as in panels A–D. G, Same as in panel F, but showing activity evoked by the fast-looming stimulus. Other conventions are the same as in panel F. H, Suppression of the mean activity (percentage) for paired, as compared with single, across the last five frames (t = 2 to t = 6), for neurons representing the slow-looming (left bar) and fast-looming (right bar) stimuli, respectively.
Simulated microstimulation rescues change detection deficits. A, Top row, Simulated laboratory change blindness task. Two oriented gratings were presented, one in each visual hemifield. The entire image spanned 1000 × 800 pixels and was encoded with 50,000 overlapping st-RNN modules. Following the blank (B), a new change image occurred in which one of the gratings (here, the grating in the left hemifield) underwent a change in orientation. Middle row, Output of the mnemonic coding (MC) st-RNN. Bottom row, Output of the change detection (CD) st-RNN. Red box: location of change, is shown for illustration only, and is not presented along with the visual input. Other conventions are the same as in Figure 4B. B, The output of the mnemonic coding (first row) and change detection (second row) st-RNNs following simulated, focal microstimulation of the right hemifield (no-change) grating representation alone (see text for details). C, Same as in panel B but following simulated, focal microstimulation of the left hemifield (change) grating representation alone. Other conventions are as in panel B. B, C, Blue box, Location of simulated microstimulation, is shown for illustration only and is not presented along with the visual input. Blue horizontal bar: duration of microstimulation. D, Quantification of change in performance following the simulated microstimulation experiments of panel B (top) and panel C (bottom), respectively. Top, Mean L1 error for units representing the right hemifield (no-change) grating without (gray dashed) or with (blue solid) simulated microstimulation. Dashed vertical lines: time of appearance of the changed image (A*). Other conventions are the same as in panel C. Bottom, Same as in the top but mean L1 error for units representing the left hemifield (change) grating. Other conventions are the same as in the top panel.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
 Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.