Predicting evolutionary domain from topology
Universal scaling across biochemical networks on Earth
Sci Adv, Jan 16, 2019; DOI: 10.1126/sciadv.aau0149

To demonstrate that topological features of genomes from different domains are distinct, multinomial regression was used. Specifically, we implemented models where the domain of the network was the response class, and a single topological feature, normalized by the size of the LCC of the network, was the dependent variable. We found topological features of networks alone were often not predictive of the domain, but the ratio of the topological properties to the size of the network provided a more accurate prediction. Prior to the regression, these normalized topological measures were scaled and centered (61). The regression was implemented in base R using the glm(..), function. To control for overfitting, the training data were composed of an equal number of samples from each domain. In particular, only 35 networks of each domain were sampled, and the model was tested on the remaining data. This process was repeated 100 times, and the average model error is reported in the text (Fig. 6E).

