Bonferroni Correction for the Multiple Significance Tests (Dunn’s Method)

OA Olga V. Artemyeva-Isman
AP Andrew C. G. Porter
request Request a Protocol
ask Ask a question
Favorite

The danger of testing multiple hypotheses is that some “significant” result may occur by chance alone (Bland and Altman, 1995). The simple Bonferroni correction or Dunn’s α-splitting (Lee and Lee, 2018) implies that the widely used threshold of statistical significance α = 0.05 must be divided by the number of tests m performed on each dataset.

Accordingly, the corrected p-value thresholds for significant changes for base pair frequency tests here are as follows: For the +5G/+5Gsub experiment, m = 31, α′ = 0.0016; for the −1G/−1Gsub experiment, m = 17, α′ = 0.0029; for the −3C/−3Csub experiment, m = 28, α′ = 0.0018; and considering all tests in this study, m = 76, α′ = 0.0007. The p-values in Figures 5, ,88 are marked with triple asterisks or double asterisks if below their respective thresholds for all tests or individual experiments.

Dunn’s application of Bonferroni correction is a stringent method, which is more likely to reject a true positive (Type II error) than to accept a false positive (Type I error) (Lee and Lee, 2018). The application of this method is justified if the outcomes of the hypothesis tests are not related. The comparisons here are independent for the positions of the sites, but strongly correlated for base pair types at each individual position, e.g., an increase in Watson–Crick pairs means the decrease in isosteric pairs if there are only two pair types or the decrease in either non-isosteric or isosteric pairs (or both) if there are three pair types at any given position. Therefore, we can adjust m for the correlated tests (Shi et al., 2012):

where R is the interclass correlation correction such as 0 ≤ R ≤ 1.

In simple terms, positions with only two pair types account for two perfectly correlated tests, so R = 1, and these two tests will count as one. For tests with three types, we can approximate R ≈ 0.5 by splitting the correlation between them and R ≈ 0.33 when there are four tests. Following this procedure, for the +5G/+5Gsub experiment, m′ = 20, α′ = 0.0025; for the −1G/−1Gsub experiment, m′ = 11, α′ = 0.0045; and for the −3C/−3Csub experiment, m′ = 17, α′ = 0.0029. P-values below their respective experiment thresholds accounting for correlated tests are marked with a single asterisk in Figures 5, ,88.

Here, we remark on the current debate on “statistical significance” among the statisticians: McShane et al. (2019) point out that the null-hypothesis significance testing—and generally accepted p-value threshold of 0.05—is a misleading paradigm for research and instead P(H0) should not be prioritized over other factors, such as plausibility of mechanism and related prior evidence (in this case, genomic conservation and mutation data).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A