Hi fellows!
This is the second part of the article where I would like to focus on the common problem in statistics - multiple comparisons.
In the first part, we dived into the main terminology of this problem and the most common solutions. In this article, we will explore practical implementation with Python code and interpretation of the results.
Let's get started!
Practical Implementation
First of all, let’s make sure that we install all necessary libraries
pip install numpy statsmodels
Bonferroni correction
#import libraries
from statsmodels.stats.multitest import multipletests
import numpy as np
# Imagine these are your p-values from testing various hypotheses
p_values = [0.005, 0.0335, 0.098543, 0.00123] # Let's say we did 4 tests
# Applying Bonferroni correction
bonf_rejected, bonf_corrected, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')
#where alpha is Type I Error (see previous article)
print("Bonferroni Approach")
print(f"Rejected: {bonf_rejected}")
print(f"Adjusted p-values: {bonf_corrected}\n")
Let's break down what we've got after applying the Bonferroni correction to your p-values:
Bonferroni Approach
Rejected: [ True False False True]
Adjusted p-values: [0.02 0.134 0.394172 0.00492 ]
- Rejected hypotheses: The Bonferroni correction tells us which hypotheses should be rejected based on the corrected threshold. Here, the first (
True
) and last (True
) hypotheses are rejected. It means that even after multiple corrections, these hypotheses show statistically significant results. - Adjusted p-values: The adjusted p-values are
[0.02, 0.134, 0.394172, 0.00492]
. These are new p-values, each of which corresponds to its original value respectively. The Bonferroni correction makes the p-values higher to control for the increased risk of Type I errors (false positives) that come with multiple testing. - Interpretation: For p-values
[0.005, 0.00123]
(original) after correction, we got[0.02, 0.00492]
. These p-values remain below the threshold of 0.05, indicating that the findings are statistically significant.
Benjamini-Hochberg correction
# Benjamini-Hochberg correction for the brave
from statsmodels.stats.multitest import multipletests
import numpy as np
# Imagine these are your p-values from testing various hypotheses
p_values = [0.005, 0.0335, 0.098543, 0.00123] # Let's say we did 4 tests
# Applying BH correction
bh_rejected, bh_corrected, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')
print("Benjamini-Hochberg Approach")
print(f"Rejected: {bh_rejected}")
print(f"Adjusted p-values: {bh_corrected}")
Let's break down what we've got after applying the Benjamini-Hochberg correction to your p-values:
Benjamini-Hochberg Approach
Rejected: [ True True False True]
Adjusted p-values: [0.01 0.04466667 0.098543 0.00492 ]
- Rejected Hypotheses:
[True, True, False, True]
indicates which hypotheses were rejected based on the adjusted p-values. In this case, the 1st, 2nd, and 4th hypotheses were rejected, suggesting significant findings in these cases. - Adjusted p-values:
[0.01, 0.04466667, 0.098543, 0.00492]
correspond to the adjusted significance levels of each hypothesis after the correction. These values are compared against the alpha level (in this case, 0.05) to determine which hypotheses are rejected. - Interpretation: For p-values
[0.005, 0.0335, 0.00123]
(original) after correction we got[0.01, 0.04466667, 0.00492]
. They remain below the threshold of 0.05, indicating that the findings are statistically significant
Interpretation of the Results in Celebrity Terms
-
First and Fourth Hypotheses: We see that both approaches reject these hypotheses. This is a sign for us that we have
definitely found the real celebrities!
-
Second Hypothesis: This hypothesis is recognized only by the Benjamini-Hochberg method. Maybe they are not well-known celebrities in whom the Benjamini-Hochberg's method (with its inherent positivity) saw a big potential. However, the ultraconservative Bonferroni prefers to be careful, missing the chance for fear of a false positive result.
This metaphor highlights the inherent trade-offs between sensitivity and specificity in statistical corrections and the importance of choosing the right approach based on the context of your research or, in our playful analogy, the type of party you are attending.
Wrapping It Up: The Takeaway
Testing a lot of hypotheses is like walking through a field full of potential mistakes. But with the right tools (thanks, Python!) and strategies (hello, Bonferroni and Benjamin-Hochberg), you can manage this process and keep your research sound. Remember, it's all about balancing risk and reward. Whether you are being extra cautious or going for big discoveries, properly handling multiple tests will help make your results more trustworthy.
Have a good data hunt!