Differential abundance analysis

Dana M. Walsh; Alexis N. Hokenstad; Jun Chen; Jaeyun Sung; Gregory D. Jenkins; Nicholas Chia; Heidi Nelson; Andrea Mariani; Marina R. S. Walther-Antonio

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Differential abundance analysis

DW Dana M. Walsh

AH Alexis N. Hokenstad

JC Jun Chen

JS Jaeyun Sung

GJ Gregory D. Jenkins

NC Nicholas Chia

HN Heidi Nelson

AM Andrea Mariani

MW Marina R. S. Walther-Antonio

This method is extracted from research article: Sci Rep, Dec 2019

Postmenopause as a key factor in the composition of the Endometrial Cancer Microbiome (ECbiome)

DOI: 10.1038/s41598-019-55720-8

Request a Protocol

Ask a question

Favorite

Based on the estimated abundance changes in our prior study with approximately 30 patients^²², the sample size in this study offers sufficient power to detect changes at a false discovery rate of 10% using the online power calculator “shinyMB” (https://fedematt.shinyapps.io/shinyMB/). We analyzed differential abundance at the phylum, family, genus, and species levels, filtering rare taxa prevalent at less than 10% of samples or taxa with a maximum proportion (relative abundance) less than 0.2% to reduce the number of necessary tests. We fit a generalized linear model with over-dispersed Poisson distribution to the count data. We estimated the library sizes (sequencing depth) using the Geometric Mean of Paired Rations (GMPR) normalization method^⁵² and the log of the GMPR size factors were used in the Poisson model as an offset to account for variable library sizes. The data was winsorized (97% upper quantile) to reduce the possible impact of outliers on parameter estimates before fitting the model. To improve our power in detecting differential taxa, we pooled the cervix and vagina data, which demonstrated consistent changes between them. We assessed statistical significance using the Wald test and used a false discovery rate (FDR) control (B-H procedure, “p.adjust” in R “stats” package) to correct for multiple testing. FDR-adjusted p values (q values) less than 0.05 were considered significant.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol