Mice were head-fixed using optical hardware (Thorlabs) and placed in a polypropylene tube to limit movement. Spout position was controlled by mounting the spout apparatus on a pressure-driven sliding linear actuator (Festo) controlled by two solenoids (Parker). Licks were detected using an infrared emitter/receiver pair (Digikey) mounted on either side of the retractable lick spout. Rewards consisted of 5–8 μl water and punishments consisted of a white noise auditory stimulus alone (early training) or white noise plus 1–3 μl of 5 mM quinine hydrochloride (Sigma) in water (late training). Behavioral training and testing was implemented with custom software written in Matlab (Mathworks). Drifting grating stimuli were presented with the Psychophysics Toolbox (Brainard, 1997). Mice were trained during the light cycle. The stimulus consisted of sine wave gratings (spatial frequency: 0.05 cycles deg−1; temporal frequency: 2 Hz) drifting at either 0 degrees (target) or 90 degrees (non-target) away from vertical. These stimuli were chosen to drive distinct groups of visual neurons with roughly equal strength (Figure 1—figure supplement 2). Thus, any large differences in stimulus selectivity observed in cortical neurons are not likely the result of stimulus strength.

Mice were trained in successive stages, with advancement to the next stage contingent on correct performance: 1) Mice received reward any time they licked the spout. 2) Trial structure was initiated by having an auditory cue tone, followed by a visual stimulus (100% targets), followed by an inter-trial interval. Mice were only rewarded for licks during the visual stimulus. 3) Once mice exhibited preferential licking during the stimulus compared to inter-trial interval, the target rate was reduced over several sessions from 100% to 50%. At this point, the non-target was a static grating orientated orthogonally to the target. Licks during non-targets were punished with white noise or white noise plus quinine. 4) Once mice exhibited the ability to discriminate target and non-target gratings (d’> 1 and RHIT - RFA > 30% for consecutive sessions, where RHIT and RFA are the hit and false alarm rate, respectively), the temporal frequency of the non-target grating was increased. 5) Spout withdrawal was introduced. At first the spout was extended within range before the stimulus appeared, then spout extend time was gradually delayed until after the stimulus had turned off (i.e., 0 s delay). 6) Finally, the variable delay period was gradually increased to 0/3/6 s (imaging mice) or 0/2/4 s (photoinhibition mice). Mice that failed to fully learn the task within 150 sessions or showed signs of infection were removed from the study. In total, we removed 11 mice from the study before data collection was complete: 3 mice for failure to consistently lick the spout for reward (stage 1), 3 mice for failure to progress during the visual discrimination phase (stage 3), 1 mouse for failure to progress at the variable delay stage (stage 6), 1 mouse that showed signs of infection, and 4 mice that completed behavioral training but either had poor viral expression or cloudy windows after surgery.

Once mice reached high levels of performance at the final stage of the task (d’> 1.5 and RHIT - RFA > 50%), they were removed from water restriction for window implantation. After recovery from window implantation surgery, they were re-trained to a level of high performance (2–7 days) before beginning experimental sessions. For both imaging and photoinhibition experiments, any sessions with poor performance were discarded (minimum performance criterion: d’> 1 and RHIT - RFA > 30%). For photoinhibition experiments, the performance criterion was applied to the control condition.

For analysis of movement during the delay period (Figure 1—figure supplement 1), cropped video frames (300 × 200 pixels; width x height) from Hit and CR trials were compared to a ‘template’ CR image to measure postural changes or changes (increases or decreases) in movement during each epoch of the task (Pre-stimulus, Delay, Response). Since some amount of movement is expected in all conditions, a pixel-wise map of the absolute difference between single CR frames and the CR template within-condition (DCR) was calculated as a measure of baseline movement:

Where f is the index of a single CR frame and F≠f is the set of all CR frames except f, and where x and y are pixel indices. The absolute difference map was calculated separately for each epoch (Pre-stimulus, Delay, Response). A pixel-wise map of the absolute difference between single Hit frames and the CR template (DHit) was calculated in the same manner:

In cases where the number of Hit frames exceeded the number of CR frames, the excluded frame was chosen at random from the CR frames. Finally, the difference in movement on Hit trials relative to CR trials (DSub) was calculated by taking the absolute value of the subtracted difference maps (Figure 1—figure supplement 1B):

Note that since the frames are compared against a CR template, this approach will capture not only transient movement but also stable postural changes specific to Hit trials. To compare between sessions, the subtracted difference maps (DSub) were averaged across all pixels for each epoch (Figure 1—figure supplement 1C).