Hi fellows!
This is the second part of the article where I would like to focus on the common problem in statistics  multiple comparisons.
In the first part, we dived into the main terminology of this problem and the most common solutions. In this article, we will explore practical implementation with Python code and interpretation of the results.
Let's get started!
Practical Implementation
First of all, let’s make sure that we install all necessary libraries
pip install numpy statsmodels
Bonferroni correction
#import libraries
from statsmodels.stats.multitest import multipletests
import numpy as np
# Imagine these are your pvalues from testing various hypotheses
p_values = [0.005, 0.0335, 0.098543, 0.00123] # Let's say we did 4 tests
# Applying Bonferroni correction
bonf_rejected, bonf_corrected, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')
#where alpha is Type I Error (see previous article)
print("Bonferroni Approach")
print(f"Rejected: {bonf_rejected}")
print(f"Adjusted pvalues: {bonf_corrected}\n")
Let's break down what we've got after applying the Bonferroni correction to your pvalues:
Bonferroni Approach
Rejected: [ True False False True]
Adjusted pvalues: [0.02 0.134 0.394172 0.00492 ]
 Rejected hypotheses: The Bonferroni correction tells us which hypotheses should be rejected based on the corrected threshold. Here, the first (
True
) and last (True
) hypotheses are rejected. It means that even after multiple corrections, these hypotheses show statistically significant results.  Adjusted pvalues: The adjusted pvalues are
[0.02, 0.134, 0.394172, 0.00492]
. These are new pvalues, each of which corresponds to its original value respectively. The Bonferroni correction makes the pvalues higher to control for the increased risk of Type I errors (false positives) that come with multiple testing.  Interpretation: For pvalues
[0.005, 0.00123]
(original) after correction, we got[0.02, 0.00492]
. These pvalues remain below the threshold of 0.05, indicating that the findings are statistically significant.
BenjaminiHochberg correction
# BenjaminiHochberg correction for the brave
from statsmodels.stats.multitest import multipletests
import numpy as np
# Imagine these are your pvalues from testing various hypotheses
p_values = [0.005, 0.0335, 0.098543, 0.00123] # Let's say we did 4 tests
# Applying BH correction
bh_rejected, bh_corrected, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')
print("BenjaminiHochberg Approach")
print(f"Rejected: {bh_rejected}")
print(f"Adjusted pvalues: {bh_corrected}")
Let's break down what we've got after applying the BenjaminiHochberg correction to your pvalues:
BenjaminiHochberg Approach
Rejected: [ True True False True]
Adjusted pvalues: [0.01 0.04466667 0.098543 0.00492 ]
 Rejected Hypotheses:
[True, True, False, True]
indicates which hypotheses were rejected based on the adjusted pvalues. In this case, the 1st, 2nd, and 4th hypotheses were rejected, suggesting significant findings in these cases.  Adjusted pvalues:
[0.01, 0.04466667, 0.098543, 0.00492]
correspond to the adjusted significance levels of each hypothesis after the correction. These values are compared against the alpha level (in this case, 0.05) to determine which hypotheses are rejected.  Interpretation: For pvalues
[0.005, 0.0335, 0.00123]
(original) after correction we got[0.01, 0.04466667, 0.00492]
. They remain below the threshold of 0.05, indicating that the findings are statistically significant
Interpretation of the Results in Celebrity Terms

First and Fourth Hypotheses: We see that both approaches reject these hypotheses. This is a sign for us that we have
definitely found the real celebrities!

Second Hypothesis: This hypothesis is recognized only by the BenjaminiHochberg method. Maybe they are not wellknown celebrities in whom the BenjaminiHochberg's method (with its inherent positivity) saw a big potential. However, the ultraconservative Bonferroni prefers to be careful, missing the chance for fear of a false positive result.
This metaphor highlights the inherent tradeoffs between sensitivity and specificity in statistical corrections and the importance of choosing the right approach based on the context of your research or, in our playful analogy, the type of party you are attending.
Wrapping It Up: The Takeaway
Testing a lot of hypotheses is like walking through a field full of potential mistakes. But with the right tools (thanks, Python!) and strategies (hello, Bonferroni and BenjaminHochberg), you can manage this process and keep your research sound. Remember, it's all about balancing risk and reward. Whether you are being extra cautious or going for big discoveries, properly handling multiple tests will help make your results more trustworthy.
Have a good data hunt!