How P-Hacking Increases False Positives

December 28, 2019

Sometimes when a hypothesis doesn’t yield a statistically significant result, there is temptation to tweak the hypothesis a bit or consider different selections of data. But this is a trap called p-hacking and it will increase the chance of false positives.

Here’s how it happens. You’re performing a hypothesis test and if you’re like most publications, you’re looking for a statistically significant result with only a 5% likelihood (i.e. a p-value < 5%) that the result was just due to chance.

Because your hypothesis didn’t work out, you do a post hoc analysis and determine you were considering a wrong variable or two. Thus, you tweak your hypothesis and run the statistical test again. You continue to do this over and over until you achieve significant results as demonstrated in this xkcd comic.

However, every time you attempted another hypothesis test, you subsequently increased the chance of getting a false positive. To see this in action, let’s say you repeated your hypothesis test 20 times (like the comic). With only one test, your chance of a false positive was just 5% (since we require a p-value of 5%). But by the time your 20th iteration came around, this rate had increased to 64%!

Let’s verify this experimentally with a bit of code:

import random

num_of_1s = 0
num_of_experiments = 500000

for i in range(0, num_of_experiments):
    # 20 hypothesis tests each with 1/20 chance of a fluke result
    sequence = [random.randint(1, 20) for i in range(0, 20)]
    if 1 in sequence:
        num_of_1s += 1

# show ratio of fluke results to total number of experiments
print(f'{num_of_1s/num_of_experiments * 100}')

64.17%

In the code above, we made sure that each hypothesis test performed has a 1/20 chance of having an erroneous result by using random.randint(1, 20). We then kept doing hypothesis tests for a total of 20 times.

Since we’re interested in the approximate expected number of false positives, we ran this scenario a good amount of times (500,000 times in this case) for a final result of 64.17%. Of course, if you shy away from approximations and fancy a little more rigor, we can use some basic probability theory to confirm our results.

So, the probability of not having a false positive is 19/20 which means the probability of not having a false positive in 20 trials is $\frac{19}{20}^{20}$ which is approximately equal to 0.36. Finally, we can determine the probability of having at least one false positive in 20 trials by subtracting 0.36 from 1 giving us 0.64.

But what if you’re working for a genomics company where there’s a need to perform multiple hypothesis tests without a strong basis for expecting the result to be statistically significant? Well, there are ways to control the increase in false positives, like the Bonferroni Correction. However, you should keep in mind that each method will have its strengths and weaknesses.

In the end, you need to be careful when performing hypothesis tests and drawing conclusions. And remember that the more hypothesis tests you perform, the more likely it’ll be that you get a false positive.

How P-Hacking Increases False Positives

See Also 👀