Two recent sources for the coronavirus-affected sets of human genes/proteins were used in this work:

The COVID-19 Drug and Gene Set Library which provides a collection of drug and gene sets related to COVID-19 research aggregated from multiple sources using natural language processing techniques (downloaded on 24 September 2020) [54]. This set of genes/proteins is further referred to as the “Aggregated” set. In this set, human genes/proteins are scored by the number of times they have been reported as related to COVID-19 with the top score of 88 assigned to STAT1 gene. We generated several subselections of genes with different cut-offs for the scores: 40, 30, 25 and 20 (counting 72, 143, 248 and 457 genes) referred to as “score_40”, “score_30”, “score_25” and “score_20” respectively. In these subselections, the genes are all initially equally weighted when propagated through interactome and the chosen score threshold serves as an adjustable model parameter. We also included a set of 5000 top genes (with a minimal score of 5, referred to as “score_5_weighted”) with each gene weighted by its score for the propagation and a minimal entry set (consisting of CTSB, CTSL, TMPRSS2 and ACE2 genes [55], first reported in the literature as involved in the initial entry of the virus) referred to as “entry_only”.

COVID-19 Pathways Portal on WikiPathways [23] was used to create a subset of 423 coronavirus-affected human genes/proteins referred to as “score_wiki”.

