For classification analyses, we used the Princeton Multi-Voxel Pattern Analysis Toolbox (www.pni.princeton.edu/mvpa). Specifically, we used subject-specific logistic regression classifiers penalized using L2-norm regularization (penalty = 1; preliminary analyses showed negligible influence of this parameter on the qualitative pattern of our results). We performed three-way (face/scene/blank) classification by learning weights for three logistic regression models during the training phase (discriminating TRs as face vs. not, scene vs. not, and blank vs. not, respectively) and then generating guesses during the test phase by labeling each TR according to the model with maximal output evidence. We verified in preliminary analyses that including the blank blocks and performing multi-way classification (as opposed to binary face vs. scene classification) did not affect the pattern of results.
To quantify classification accuracy, we averaged the results of 6-fold cross-validation. The classifier in each fold was trained on 5/6th of the data and tested on the left-out 1/6th of the data. Because only one localizer run was used for this cross-validation (the other was used to independently define selectivity), these divisions of the data into training and test sets occurred in the same fMRI run. Data from the same run can have dependencies, both locally when activity in the previous block spills over into the current block, and globally as a result of non-task factors like head motion or arousal. Despite this, our within-run approach was unbiased. With respect to local dependencies, all conditions being classified were present in each run and alternated between each other, and thus any spill-over (into a period with a different label) would hurt performance. With respect to global dependencies, because again the full design existed within each run (and training/test sets), any general factors would apply to all conditions and not systematically support classification between conditions. Chance classification accuracy was calculated empirically by randomly permuting the category labels across TRs in the localizer run before performing MVPA (block-level scrambling produced identical results). This process was repeated 10,000 times for each participant, and the average classifier accuracy across permutations and participants provided the baseline level of performance that would be expected due to chance.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.