The sample was based on a corpus of psychology articles that had been examined in a previous project investigating the impact of an ‘open badges' scheme introduced at Psychological Science [15]. This sample was selected because the open badges scheme and assessment of reusability by Kidwell et al. [15] enabled us to largely circumvent upstream issues related to data availability and data management and instead focus on downstream issues related to analytic reproducibility. A precision analysis indicates that the sample size affords adequate precision for the purposes of gauging policy compliance (electronic supplementary material, section B).

Of 47 articles marked with an open data badge, Kidwell and colleagues had identified 35 with datasets that met four reusability criteria (accessible, correct, complete and understandable). For each of these articles, one investigator (T.E.H.) attempted to identify a coherent set of descriptive and inferential statistics (e.g. means, standard deviations, t-values, p-values; figure 3), roughly 2–3 paragraphs of text, sometimes including a table or figure, related to a ‘substantive' finding based on ‘relatively straightforward' analyses. We focused on substantive findings because they are the most important and straightforward analyses to ensure that our team had sufficient expertise to re-run them. In total, 789 discrete numerical values reported in 25 articles published between January 2014 and May 2015 were designated as target values. Further information about the sample is available in electronic supplementary material, section B.

Frequency of reproducibility outcomes by value type. Variation/uncertainty measures include standard deviations, standard errors and confidence intervals. Effect sizes include Cohen's d, Pearson's r, partial eta squared and phi. Test statistics include t, F and χ2. Central tendency measures include means and medians.

