request Request a Protocol
ask Ask a question
Favorite

Each author’s gender was estimated using genderize.io (https://genderize.io/) and Gender API (https://gender-api.com/). Gender API is the most comprehensive platform available that assigns the most probable gender based on first name using more than 6 million validated names from 191 different countries. As input, queries for this API can include name (first and last) and location (country name, IP address or browser language). The API outputs gender assignments (men, women or unknown), number of samples and accuracy (from 0 to 100). The genderize.io database includes 250 000 names from 241 different countries (114 million entries in total). Queries for this service can be made with a first name and location (country) from which a probable gender is assigned where possible (men, women), count (number of samples) and probability (range 0–1). The APIs were both accessed through R software. We developed a customized R package (https://github.com/hugofitipaldi/genderAPI) through which https://gender-api.com/ can be accessed.

We parsed the co-authorship list into a series of text tidying processes that included splitting authors’ names into first and last names, removing abbreviations and accentuation. Out of the 64 061 unique author names, a small number of studies (3%) reported authorships in a format that abbreviates the authors’ first names, precluding their use in any of the tools we used to estimate gender; these papers were excluded from the gender analysis. For genderize.io, we used authors’ first names and the first country of affiliation of each author to query the API. For Gender API, we also added authors’ last names in the queries.

A comparison between name-gender classification for the 64 061 valid names that were included in both tools’ databases showed no statistical difference (see Supplementary Material, Table S2), leading us to conclude that both methods are equally valid. The APIs made equal predictions for 93% of the names. For 4% of the data one of the APIs could not make a conclusive prediction on gender (‘unknown’) and for 3% the predictions between men/women were distinct. In the first case, we used the results of the API that was able to predict gender, and for the second we choose the result that yielded higher accuracy.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A