This paper builds on a multidimensional network analysis employing methods of statistical mechanics, such as descriptive and statistical-inference analysis, parametric fitting, and non-parametric estimation methods84–88, to study the uneven spread of the COVID-19 pandemic. The descriptive methods used in the analysis are graphic methods aiming to display different aspects of distributions of the available data, either in a spatial context (spatial distribution maps, see89) or in a single-variable (boxplots plotting the median, Q1 and Q3 quartiles, and potential outliers and extreme values) or pair-wise (boxplots and scatter-plots plotting ordered pairs of numeric values corresponding to different variables) consideration86,90. In terms of statistical inference, the analysis builds on the formulation of error bars representing confidence intervals (CIs) constructed for estimating (at a 95% confidence level) the difference of the mean values between groups of cases within a variable86. These error bars graphically illustrate an independent samples t-test of the mean90. When they intersect with the zero-line (horizontal axis), the mean values of the groups cannot be considered statistically different, whereas when they do not intersect they can.
Parametric fitting techniques are applied to estimate the parametric curve that best describes the variability of the dataset displayed in a scatter plot. The available fitting curves examined in this part of the analysis are linear (1st-order polynomial, abbreviated Poly1), quadratic (2nd-order polynomial, Poly2), cubic (3rd-order polynomial, Poly3), one-term power-law (Power1), one-term Gaussian (Gauss1), one-term exponential (Exp1), and one-term logarithmic (Log1). All available types of fitting-curves can generally be described by the general multivariate linear regression model86:
where f(x) is either logarithmic f(x) = (log(x))m, or polynomial f(x) = xm, or exponential f(x) = (exp{x})m, with m = 1, 2, 3, or power f(x) = ax. The curve-fitting process estimates the bi and a (where is applicable) parameters that best fit the observed data and simultaneously minimize the square differences 86, as is shown in the relation:
The parameter estimation uses the Least-Squares Linear Regression (LSLR) method, based on the normality assumption for the differences 86,90.
Finally, the non-parametric kernel density estimation (KDE) method estimates the probability density function of a random variable. The KDE method returns an estimate of the probability density function for the sample data in a vector variable x. This estimate is based on a normal kernel function84,85 and is evaluated at equally-spaced (100 in number) xi points covering the data’s range. In particular, for a uni-variate, independent, and identically distributed sample x = (x1, x2, …, xn), extracted from a distribution with unknown density (at any given point x), the kernel density estimator describes the shape of the probability-density function ƒ, according to the relation84,85:
where K is the kernel (a non-negative) function and h > 0 is a smoothing parameter called bandwidth, which provides a scale (desirably the lowest possible h) in the kernel function Kh(x) = 1/h·K(x/h) depending on the bias-variance trade-off dilemma91.
Overall, the multilevel analysis builds on statistical mechanics of the available network, socioeconomic, and geographical variables to conceptualize the worldwide uneven spatio-temporal spread of COVID-19 within the context of the global interconnected economy represented by the GTN.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.