2.3. Development of the Search

JG Juan F. Garcia
MD M. Jose Diez
AS Ana M. Sahagun
RD Raquel Diez
MS Matilde Sierra
JG Juan J. Garcia
MF M. Nelida Fernandez
ask Ask a question
Favorite

We carried out an automatic extraction of information (through software, see Section 2.4 “Information extraction process” for details) for most of the process. This software not only simplified tasks to be performed but also allowed for greater precision when obtaining results, reducing both fatigue and propensity for human error. Our contribution can thus be summarized as the simple and precise extraction of high volumes of reliable data, which was mainly achieved via software automation.

Thanks in part to the automation achieved, we analyzed more than 3000 websites, a much higher volume than that analyzed in other works, in which researchers analyzed the first few pages of the results of these engines (about 100 or 200 websites in the best case, possibly many fewer if we discard duplicates or advertisements) [29,30].

Contrary to popular belief, the terms of service for search engines do not prohibit automatic searches (using software tools such as bots or crawlers). What they do prohibit is searches that are done using means other than those provided by them (example: the graphical interface of the engine) or those that abuse the system (automatic searches that make excessive use of the service, as thousands of requests per hour could slow or even block services offered to the public) [31].

To perform searches according to these conditions, we developed a specific piece of Java software that performs searches using a web browser (Firefox or Google Chrome), from which it accesses a search engine and directly collects results.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A