Based on the estimated abundance changes in our prior study with approximately 30 patients22, the sample size in this study offers sufficient power to detect changes at a false discovery rate of 10% using the online power calculator “shinyMB” (https://fedematt.shinyapps.io/shinyMB/). We analyzed differential abundance at the phylum, family, genus, and species levels, filtering rare taxa prevalent at less than 10% of samples or taxa with a maximum proportion (relative abundance) less than 0.2% to reduce the number of necessary tests. We fit a generalized linear model with over-dispersed Poisson distribution to the count data. We estimated the library sizes (sequencing depth) using the Geometric Mean of Paired Rations (GMPR) normalization method52 and the log of the GMPR size factors were used in the Poisson model as an offset to account for variable library sizes. The data was winsorized (97% upper quantile) to reduce the possible impact of outliers on parameter estimates before fitting the model. To improve our power in detecting differential taxa, we pooled the cervix and vagina data, which demonstrated consistent changes between them. We assessed statistical significance using the Wald test and used a false discovery rate (FDR) control (B-H procedure, “p.adjust” in R “stats” package) to correct for multiple testing. FDR-adjusted p values (q values) less than 0.05 were considered significant.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.