Quality of reporting evaluation

CC Clarissa F. D. Carneiro
VQ Victor G. S. Queiroz
TM Thiago C. Moulin
CC Carlos A. M. Carvalho
CH Clarissa B. Haas
DR Danielle Rayêe
DH David E. Henshall
ED Evandro A. De-Souza
FA Felippe E. Amorim
FB Flávia Z. Boos
GG Gerson D. Guercio
IC Igor R. Costa
KH Karina L. Hajdu
LE Lieve van Egmond
MM Martin Modrák
PT Pedro B. Tan
RA Richard J. Abdill
SB Steven J. Burgess
SG Sylvia F. S. Guerra
VB Vanessa T. Bortoluzzi
OA Olavo B. Amaral
request Request a Protocol
ask Ask a question
Favorite

Evaluation of each study was performed through an online questionnaire implemented on Google Forms. Questions were based on existing reporting guidelines [6, 13, 25, 30], journal checklists [32] and previous studies on quality of reporting [19, 40], and are presented along with their response options on Table S1. They were based on direct, objective criteria, in an attempt to avoid the need for subjective evaluation. Analyzed reporting items included measures to reduce risk of bias (e.g. blinding, conflict of interest reporting), details on reagents (e.g. antibody validation, reagent source), data presentation (e.g. summary and variation measures, identifiable groups, definition of symbols used), data analysis (e.g. statistical tests used, exact p values) and details on the biological model (e.g. culture conditions, animal species and strain, human subject recruitment and eligibility, ethical requirements). As not all of these apply to every article, some questions were category-specific, while others could be answered as ‘not applicable’. A detailed Instructions Manual for answering the questions (available as Supplementary Text 1) was distributed to evaluators to standardize interpretation. Importantly, most questions concerned only the result selected for analysis (i.e. the first table, figure or subpanel fulfilling our inclusion criteria) and not the whole set of results.

Two additional questions regarding evaluators’ subjective assessments were included in the questionnaire, to be answered on a five-point scale. The first asked whether the title and abstract provided a clear idea of the article’s main findings, ranging from “Not clear at all” to “Perfectly clear”. The second one asked whether the information required in the questionnaire was easy to find and extract from the article, ranging from “Very hard” to “Very easy”.

Evaluators were biomedical researchers recruited locally at Brazilian universities and online through the ASAPbio blog [2] and social media. To be included as evaluators, candidates had to reach an agreement of at least 75% in a test set of 4 articles. This comparison was based on the consensus answers of 3 members of the coordinating team (C.F.D.C, T.C.M. and O.B.A.) for 2 sets of 4 articles, reached after extensive discussion over possible disagreements. A candidate who failed to reach the required level of agreement on the first set could try again on the second set after reviewing their own answers along with the consensus in the first test. After achieving the agreement threshold, evaluators had access to the consensus answers as well as their own on the evaluated set(s).

As the paired sample comparison was started almost a year after the independent samples one, we sought to determine whether the initial analysis of preprints could be reused for the paired sample. For this, we performed correlations between time and score for each evaluator in the first stage and compared the mean r value to zero. Additionally, we performed equivalence tests between the score obtained in the first stage to the score from an independent reanalysis by a single evaluator in the second stage for a sample of 35 preprints. Though there was no clear evidence that individual evaluators changed their scoring over time, the equivalence test (with an estimated power of 90% to detect equivalence at ±5% with α = 0.05) failed to provide statistical evidence for equivalence at the ±5% bound (see https://osf.io/g3ehr/ and https://osf.io/h7s3g/ for details). Therefore, all preprints included in the paired sample comparison were reanalyzed to avoid any time-related bias in the comparison between preprints and their published versions.

Each article was assessed independently by three evaluators, and the most prevalent answer among them for each question was considered final (except for subjective assessments, where the final score was the mean of the three evaluations). If all three evaluators reached different answers (a possibility arising when more than two response options were available), the question was discussed by the coordinating team until consensus was reached.

PDF files were redacted so that evaluators were blinded to the journal, list of authors, their affiliation and funders. However, some of this information could still be inferred from the formatting of the PDF file or from methodological details (such as the ethics committee or place of sample collection). As we considered typesetting to be a direct consequence of the editorial process, we chose to maintain the original formatting of articles, which meant that most journal articles were recognizable as such. Consequently, evaluators were not blinded to the group of origin of articles.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A