Short statistical methods

RD Ruth F. Dubin
MW Mary Whooley
AP Alexander Pico
PG Peter Ganz
NS Nelson B. Schiller
CM Craig Meyer
request Request a Protocol
ask Ask a question
Favorite

First, we examined baseline characteristics of the demographics and comorbidities among the whole cohort, and groups with or without CKD. Categorical variables were evaluated by chi2. For continuous variables, normality was evaluated by visual inspection, equal variance by F-tests, and then Student’s t-tests, Welch’s t-tests, and Mann-Whitney tests were used as appropriate to evaluate differences in characteristics by CKD-status. We selected 1068 aptamers that were designated “human” and evaluated these as potential predictors in the random survival forests. After normal standardization of the 1068 proteins, random survival forests[26] were used to model time to HF hospitalization. Random forest regression models have several advantages, including better integration of correlated variables, excellent predictive qualities[27, 28] and the capacity to incorporate interactions between variables and non-linear relationships in the model.[29] The candidate proteins selected in the CKD-subgroup were further reduced by fitting a Cox Least Absolute Shrinkage and Selection Operator (LASSO) regression model predicting heart failure in those with CKD. After eliminating less predictive proteins by the Cox LASSO method, we fit separate Cox proportional hazards regression models for each of the final LASSO selected proteins predicting time to HF hospitalization in the CKD subgroup, successively adjusting for baseline measures of age (continuous, years), eGFR (continuous, ml/min/1.73m2) and history of heart failure (yes / no). To verify our LASSO analysis, we performed a sensitivity analysis. We repeated the Cox LASSO regression analyses in the CKD-subgroup instead using the set of proteins selected in the random forest for the full sample as potential predictors of heart failure hospitalization. Pearson’s correlation coefficients were estimated between proteins selected in the CKD subgroup and eGFR or albuminuria in the full sample of participants. Additionally, we tested interactions between the top predictive proteins selected in random survival forest model for the full sample and CKD (yes / no), with significance threshold for the interaction term at p<0.05. Data management and statistical analyses were conducted using R version 3.3.0.[30] Pathway analyses were performed using Gene Ontology (geneontology.org) and WikiPathways (wikipathways.org). An organizational chart for these analyses is shown in Fig 1 (below). A full description of the Heart and Soul Study, statistical methods and pathway analyses is found in S1 File, Expanded Methods.

This flowchart depicts our analytic plan. Results of analyses in boxes shaded grey were performed but had lower relevance to the project Aims and thus are not included in the manuscript. H&S: Heart and Soul. LASSO: Least Absolute Shrinkage and Selection Operator. PH: proportional hazard.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A