Fama-MacBeth regression is a two-step procedure (online supplemental material 1 pp. 2–3). In the first step, it runs a cross-sectional regression at each point in time; the second step estimates the coefficient as the average of the cross-sectional regression estimates. Since these estimates might have autocorrelations, we adjust the error of the average with a Newey-West approach. Mathematically, our method proceeds as follows.
Step 1: Let T be the length of the time period and M be the number of control variables. For each timestamp t, we run a cross-sectional regression:
Step 2: Estimate the average of the regression coefficient estimates obtained from the first step:
We use the Newey-West approach39 to adjust for the time-series autocorrelation and heteroscedasticity in calculating the SEs in the second step. Specifically, the Newey-West estimators can be expressed as
where , where e represents residuals and is the lag (online supplemental material 1 pp. 2–3).
The Fama-MacBeth regression with Newey-West adjustment has two advantages: (1) It avoids the spurious regression problem for non-stationary series, as the first-step estimates, , have much milder autocorrelations than the autocorrelations (time trends) within the observations. Such autocorrelations can be adjusted by the Newey-West procedure. (2) Only cross-sectional coefficient estimates in the first step are used to estimate the coefficients, but not their SEs; hence, any heteroscedasticity and residual-dependent issues in the first step will not influence the final results because the heteroscedasticity and residual dependency (including the one caused by spatial correlation) does not alter the unbiasedness of the coefficient in the ordinary least squares estimation. Online supplemental table 5 shows the detailed coefficients of temperature and relative humidity in the first step of the Fama-MacBeth regression.
bmjopen-2020-043863supp005.pdf
Note that the Fama-MacBeth regression with Newey-West adjustment is commonly used in estimating parameters for finance and economic models that are valid in the presence of cross-sectional correlation and time-series autocorrelation.30–32 To the best of our knowledge, our study is a novel application of this method in emergent public health and epidemiological problems.
In our implementation, on each day of the study period, we perform a cross-sectional regression of the daily R values of various cities or counties based on their 6-day average temperature and relative humidity values, as well as several categories of control variables, including the following:
Demographics. The population density and the fraction of people aged 65 and older for both China and the USA.
Socioeconomic statuses. The GDP per capita for Chinese cities. For the US counties, the Gini index and the first principal component analysis factor derived from several factors including GDP per capita, personal income, the fraction of the population below the poverty level, the fraction of the population not in the labour force (16 years or over), the fraction of the population with a total household income more than US$200 000 and the fraction of the population with food stamp/SNAP benefits.
Geographical variables. Latitudes and longitudes.
Healthcare. The number of doctors in Chinese cities and the number of ICU beds per capita for US counties.
Human mobility status. For Chinese cities, the number of people who migrated from Wuhan in the 14 days prior to the R measurement and the drop rate of the Baidu Mobility Index compared with the same day in the first week of January 2020.22 For US counties, the fraction of maximum moving distance over the median of normal time (weekdays from 17 February to 7 March) and home-stay minutes are used as mobility proxies. All human mobility controls are averaged over a 6-day period in the regression.
All analyses are conducted in Stata V.16.0.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.