Statistical analysis

MJ Michael S. Jones JD Jason A. Delborne JE Johanna Elsensohn PM Paul D. Mitchell ZB Zachary S. Brown

This protocol is extracted from research article:

Does the U.S. public support using gene drives in agriculture? And what do they want to know?

**
Sci Adv**,
Sep 11, 2019;
DOI:
10.1126/sciadv.aau8462

Does the U.S. public support using gene drives in agriculture? And what do they want to know?

Procedure

The survey responses analyzed by design as dependent variables in this study were as follows: support for agricultural gene drive applications (five-point Likert scale responses), support for gene drive inclusion in organic certification (five-point Likert scale), FAQ selection, and perceived relative importance of gene drive uncertainties (BWS indicators for most/least important). For concise interpretation, in Figs. 1 to 3 and Table 1, we aggregated in the main text the five-point Likert scales for support (with a “don’t know” option), into a three-point “agree” [Strongly Agree + Agree], “neutral” [Neither + Don’t Know], and “disagree” [Strongly Disagree + Disagree] scale [following condensing in (*40*)]. Our statistical analysis used Wald tests of differences in subgroup means of these responses and generalized linear regression models to estimate the marginal effects of different gene drive factors and respondent characteristics on these outcomes.

Ordered logit regression models were used to obtain statistical estimates for the ordinal, Likert-scale responses in Fig. 1 and Table 1. Given concerns about violations of the proportional odds assumption in ordered logit models, which are common in empirical work (*55*, *56*), partial proportional odds (PPO)–ordered logit models were used where appropriate with the *gologit2* command in Stata. In a PPO-ordered logit model, some β-coefficients may be constrained to be the same across dependent variable levels (as in a standard ordered logit model), while others may be allowed to vary if the proportional odds assumption is rejected at the 0.05 confidence level. In an example from (*55*), with *j*-dependent variable levels, βs for *X*_{1} and *X*_{2} may be constrained, while βs for *X*_{3} vary$$\begin{array}{cc}\hfill P({Y}_{i}>j)& =\frac{\text{exp}({\mathrm{\alpha}}_{j}+{X}_{1i}{\mathrm{\beta}}_{1}+{X}_{2i}{\mathrm{\beta}}_{2}+{X}_{3i}{\mathrm{\beta}}_{3j})}{1+(\text{exp}({\mathrm{\alpha}}_{j}+{X}_{1i}{\mathrm{\beta}}_{1}+{X}_{2i}{\mathrm{\beta}}_{2}+{X}_{3i}{\mathrm{\beta}}_{3j}))},\hfill \\ \hfill j& =1,2,\dots ,M-1\hfill \end{array}$$

Ordinary least squares (OLS) models were used as robustness checks against the ordered logit models (table S8). OLS was also used to estimate marginal effects on the count of selected FAQs (Table 1), with a Tobit model used in robustness checks (table S5).

All estimation was done in Stata version 14. SEs for all regression coefficients (in Table 1 and used to estimate statistical precision in Figs. 1 to 3) accounted for GfK-provided survey weights and within-respondent clustering. Marginal effects for ordered logit regressions were obtained with Stata’s *margins* command, which estimates SEs using the delta method.

The exposure of every respondent to the complete factorial of the three binary gene drive factors in eliciting general support ensured that these factors are not correlated with observed or unobserved respondent characteristics, reducing statistical bias and imprecision in estimates of these effects. The experimental design of the BWS exercise, and random assignment of respondents to the three blocks in this exercise, ensured that the subsamples presented with each block are statistically indistinguishable. The random ordering of different drive types and the BWS sets protected against bias from order effects in these measurements. Weighted least-squares regression was used to statistically rank uncertainty items (*29*). For this estimation procedure, the dependent variable was the total (sample level) log frequency of the 10* (10 − 1) = 90 possible most-least important pairs (i.e., “best-worst” pairs). The log selection frequency for each pair is a linear function of the difference in utility (*29*). Independent variables were 10 − 1 = 9 items (cost-effectiveness used as reference), which are coded 1 for the pair’s “most important” item and “−1” for the pair’s least important item. The regression weights were the frequencies each pair appears in the balanced incomplete block design.

In examining the impact of viewing the FAQs on gene drive attitudes, we approximated this impact with an IV linear two-stage regression model via the *cmp* command in Stata 14. First stages were defined as linear probability models (OLS) for binary variables of viewing each (of seven) FAQs. All demographic and consumption covariates were included in first-stage models, along with IVs of the random “forced” viewing of each unselected FAQ with (independent) one-third probability. The second stage OLS-dependent variable is the three-level Likert for support, neutrality, or opposition to gene drive applications and included as regressors all demographic and consumption variables in Table 1 and seven binary variables for ultimately seeing (voluntary or otherwise) each FAQ. A joint Wald test of significance of all seven second-stage covariates for seeing the FAQs was insignificant (*P* = 0.37). These results are presented in table S10. Specifying the second stage as a multinomial probit model produced nearly identical results (joint test of FAQ covariates at *P* = 0.32).

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.