Baseline characteristics of cases and controls were compared by either Wilcoxon rank sum or χ2-tests, for continuous or categorical variables, respectively. The signal of each metabolite was normalised within a given batch to standardise the batch variability, and for missing (below limit of detection) values, we imputed the minimum non-missing value. Metabolite values were log-transformed for analysis. Conditional logistic regression was used to examine odds ratios (ORs) and their 95% confidence intervals (CI), for the association between prostate cancer and each log-metabolite signal with an 80th percentile increase. Only matching factors were included in the final models. In addition to the matching factors, we performed sensitivity analyses to include BMI (<25 kg m−2, 25–30 kg m−2, or ⩾30 kg m−2) in the model, in addition to several other potential confounding factors. These included smoking status (never, former, or current), diabetes (yes or no), height (<175 cm, 175–180 cm, or ⩾180 cm), physical activity (<1 h per week, 1–3 h per week, or ⩾4 h per week), alcohol consumption (<0.05 drinks per day, 0.05–1 drinks per day, or ⩾1 drinks per day), processed meat consumption (<6.6 g per day, 6.6–16.8 g per day, or ⩾16.8 g per day), red meat consumption (<20.4 g per day, 20.4–43.7 g per day, or ⩾43.7 g per day), and total fat intake (<60.3 g per day, 60.3–88 g per day, or ⩾88 g per day). According to a Bonferroni correction for 695 tests, the threshold for statistical significance in our analysis is P=0.000072. However, this threshold is highly stringent due to the inter-correlations between many metabolites. We therefore also used principle component analysis (Jolliffe, 2005), and explored whether the grouped metabolites can distinguish case–control status. The top 10 principle components of metabolite measurements were calculated, and the same approach of conditional logistic regression (log-level with an 80th percentile increase) was applied to examine whether these components were associated with overall or aggressive prostate cancer. We also used a false-discovery rate (FDR) of 20% to define the significance threshold. We used gene-set analysis (GSA), a standard pathway analysis, to examine whether pre-defined metabolic super- and sub-pathways were associated with prostate cancer (Subramanian et al, 2005).
We performed additional analyses restricting to aggressive or non-aggressive cases, non-Hispanic white men, black men, and men of other races, and stratified by age at enrolment (<65 vs 65+ years) and follow-up time (<10 years, ⩾10 years). The analysis by race was conducted, in part, for comparison with prior studies of Caucasian populations (Mondul et al, 2014, 2015).
Analyses were performed with SAS software version 9.3 (SAS Institute, Cary, NC, USA), and the GSA analysis was performed with the R statistical language version 3.2.3 (Vienna, Austria). All reported P-values are two-sided.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.