Bot detection and protection

Marybec Griffin; Richard J. Martino; Caleb LoSchiavo; Camilla Comer-Carruthers; Kristen D. Krause; Christopher B. Stults; Perry N. Halkitis

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Bot detection and protection

MG Marybec Griffin

RM Richard J. Martino

CL Caleb LoSchiavo

CC Camilla Comer-Carruthers

KK Kristen D. Krause

CS Christopher B. Stults

PH Perry N. Halkitis

This method is extracted from research article: Qual Quant, Oct 2021

Ensuring survey research data integrity in the era of internet bots

DOI: 10.1007/s11135-021-01252-1

Request a Protocol

Ask a question

Favorite

The detection of bots and the protection of data quality is a tri-fold process that includes intentional design choices for the survey, recruitment, and data cleaning steps. Separately, neither of these three steps are sufficient to reduce the number of bots; however, taken together the three processes create a strong protocol for reducing the number of bots who take internet-based surveys and removing the bot responses that bypass the other processes. For surveys, there are a number of built in survey protection settings across survey platforms. While Qualtrics survey protection settings, such as prevent ballot box stuffing (a tool that places a cookie in the browser once a person has submitted a response), reCAPTCHA (Completely Automated Public Turing Test to tell Computers and Humans Apart) scores (a question placed prior to the survey asking the respondent to identify certain items in pictures or replicate a series of letters), bot detection (a Qualtrics survey question that indicates a reCAPTCHA score that relates to the probability that the respondent is a bot), and HTTP referer verification (an option that verifies all responses come from a specific link) were activated at the launch of the survey, sophisticated bots were able to bypass these protective measures. Data were collected in two waves, before and after bot-detection. Initially, research staff conducted recruitment between May 7 and 8. Discrepancies in numbers of completed COVID-19 and incentive surveys prompted staff to pause recruitment and examine these inconsistencies, with the latter being much higher indicating that bots had specifically targeted the incentive survey. Creating two separate surveys and thereby unlinking the data from the incentive survey proved to be an effective strategy to reduce the likelihood of bots compromising the integrity of the data but offered no protection from depleting the research funds designated for participant incentives. Once the initial data were purged of bot-responses, as described below, the research team disseminated the survey again over an initial two-day trial period between June 11 and 12, using the same venues as in the initial recruitment.

This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol