Network of English Wikipedia articles of 2017

GR Guillaume Rollin
JL José Lages
DS Dima L. Shepelyansky
request Request a Protocol
ask Ask a question
Favorite

We analyze the English language edition of Wikipedia collected in May 2017 (ENWIKI2017) [30] containing N = 5 416 537 articles (nodes) connected by Nl = 122 232 932 directed hyperlinks between articles (without self-citations). From this data set we extract the Ncr = 37 types of cancers listed at [24]. From [25] we also collect names of drugs related to cancer diseases obtaining the list of Nd = 203 drugs present at Wikipedia. The lists of 37 cancer types and 203 drugs are given in Table 1 and and2.2. This reduced set of Nr = 240 nodes is illustrated in the inset of Fig 1. For global influence investigations, it is complemented by Ncn = 195 world countries listed in [28]. Thus in total we have the reduced network of Nr = Ncr + Nd + Ncn = 435 ≪ N nodes embedded in the global network with more than 5 millions nodes. All data sets are available at [28]. The present study complies with Wikimedia terms of use.

Bottom right inset: subnetwork of Nr = 240 articles comprising Ncr = 37 articles devoted to cancers (green nodes) and Nd = 203 articles devoted to cancer drugs (golden nodes). Main figure: subnetwork of top 20 cancers and top 20 cancer drugs extracted from the ranking of 2017 English Wikipedia using PageRank algorithm (see Table 3). The bulk of the other Wikipedia articles is not shown. Arrows symbolize hyperlinks between cancer and cancer drug articles in the global Wikipedia.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A