Logistic regression was used to test the significance of associations between several study variables (college major, housing choice, age, gender, and family history of depression) and outcome (major depression as assessed by the PHQ-9). A conservative estimate for logistic regression is to have 10 cases per study variable [16]. The team wished to have enough depression cases to make significant inferences, so we targeted at least 60 cases for that purpose. In addition to the five pre-specified variables (which were “forced into” a multiple logistic regression model), variables assessing cigarette usage in the past 30 days and binge drinking in the past two weeks were “introduced” into the full model and were set to be retained or removed via stepwise selection with a cutoff of p < 0.05. Interaction terms between gender and major, gender and housing, and housing and major were also tested. The interaction terms were introduced into the full model and retained or removed based on the principles of “purposeful selection” [17]. The study variables of the final model were assessed for multicollinearity. To do this, the condition number and variance inflation factor (VIF) were calculated; values below 30 for the condition number and 10 for the VIF were sought in order to ensure that there was not significant multicollinearity [18].
For the purposes of analysis in logistic regression, the housing arrangement was categorized as “on campus” (living in a residence hall or Greek housing) vs. “off campus” (University affiliated apartment, parent’s home, or non-affiliated apartment or home), and the college major category was categorized as “STEM” (science, technology, engineering, and math), non-STEM, or undecided/interdisciplinary. Majors in the College of Sciences requiring a BS or majors in the Colleges of Biomedical Sciences, Engineering, and Computer Sciences were considered to be STEM majors. An undecided/interdisciplinary major was used as the reference category for the logistic regression. After viewing the data, we chose to analyze college majors in the arts and humanities as a separate category due to its high prevalence of depression.
Demographics were reported for the whole population and by those scoring positive or negative for depression. Frequencies of various demographic variables were compared between the depressed and non-depressed groups by Chi-square tests or Fisher’s exact test (if a group was to have a count less than 5). The level of confidence for all statistical tests was p < 0.05. All analyses were performed using Stata 11 (College Station, TX).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.