# Mike Love’s blog

## Combining p-values: Fisher’s method, sum of p-values, binomial

Posted in statistics by mikelove on March 12, 2012

I’ve been thinking about methods for combining p-values from independent tests. Fisher’s method is the classic one, but in general it helps to think about the problem in geometric terms. A set of n p-values is a point in an n-dimensional cube. Under the null hypothesis, these points are distributed uniformly in the cube. We want a function F that maps points in this volume to [0,1]. Another requirement is that the volume satisfying $F(x) < z$ should have volume z. We are mostly interested in the points near the origin, sets of small p-values. The question is how to specify which points are close to the origin.

[updated: a paper on this topic: Art Owen, KARL PEARSONâ€™S META-ANALYSIS REVISITED, The Annals of Statistics (2009) (arxiv.org stanford.edu)]

Fisher’s method calculates the product of p-values, then takes the logarithm and multiplies by -2. Under the null hypothesis, we have:

$-2 \sum_{i=1}^{k} \log(p_i) \sim \chi_{2k}^{2}$

The Wikipedia page explains that the negative log of uniform random variables follows an exponential distribution. When scaled by 2 these follow a chi-squared distribution with 2 degrees of freedom, and a sum of chi-squared distributions is also chi-squared with the degrees of freedom summed. If the p-values are lower than expected under the null hypothesis, then the combined value from Fisher’s method will be in the upper tail. We can write this method in R:

fishersMethod = function(x) pchisq(-2 * sum(log(x)),df=2*length(x),lower=FALSE)

I was thinking of summing p-values as well, which would have different behavior to the log product. As described by John Cook, the sum of n random uniform variables follows a law which can be expressed fairly simply. The cumulative distribution functions is as follows (see John Cook’s blog for the density and reference)

$P(X \le x) = \frac{1}{n!} \sum_{k=0}^{n} -1^k \binom{n}{k} (x-k)_{+}^{n}$

And due to the Central Limit Theorem, as the number of random uniforms increases, the sum will converge in distribution to a normal distribution with mean n/2 and variance n/12. So we can also use this law to give a combined p-value to a set of p-values. If the p-values are lower than expected from the null, the sum would show up in the lower tail of the distribution. This cdf written as an R function:

 psumunif = function(x,n) 1/factorial(n) * sum(sapply(0:n, function(k) (-1)^k * choose(n,k) * ifelse(x > k,x-k,0)^(n))) 
and switching for large n to the normal approximation:
 sumPvalsMethod = function(x) { n = length(x) if (n < 10) { psumunif(sum(x),n) } else { pnorm(sum(x),n/2,sqrt(n/12),lower=TRUE) }} 
The difference between these two methods can be visualized with a 2 dimensional example. The axes are possible p-values from 0 to 1 for two independent tests. The bottom left corner is (0,0) and the top right corner is (1,1). The contours of the combined p-values are shown:

 contourplot(outer(seq(0,1,.05),seq(0,1,.05),Vectorize(function(x,y) fishersMethod(c(x,y)))),main="Fisher's method") contourplot(outer(seq(0,1,.05),seq(0,1,.05),Vectorize(function(x,y) sumPvalsMethod(c(x,y)))),main="Sum of p-values method") 
The contours of the sum of p-values method are hyperplanes cutting off the corner of the n-dimensional cube. What might be preferable with the sum method is that the regions near the top left and bottom right corner are given higher values than with Fisher’s method. In other words, Fisher’s method will forgive a few large p-values given that the others are small. For example:
 x = c(1e-3,1e-3,1e-3,1) fishersMethod(x) [1] 1.719731e-06 sumPvalsMethod(x) [1] 0.04216892 
A downside is that this method seems to be more sensitive to unbalanced p-values which can result from a bad construction of the null hypotheses underlying the original tests. The intuition is that the hyperplanes are reaching further into the middle of the cube than the contours of Fisher’s method. For example:
 x = c(runif(1000),runif(100,.1,.2)) fishersMethod(x) [1] 0.01510909 sumPvalsMethod(x) [1] 0.0005628212 
Finally, another method would be to count the number of p-values < .05. Given independent tests, under the null hypothesis for all tests, this count will be distributed as a binomial random variable with size n and probability .05. In this case the upper tail is where a set of too small p-values would show up:

 binomMethod = function(x) pbinom(sum(x < .05),size=length(x),prob=.05,lower=FALSE)

Geometrically, this is represented by blocks of width .05 following the axes. Intersections of blocks result in lower p-values. One problem with this method is from discreteness: with small n, the combined p-value will not be uniformly distributed across [0,1]. In fact, the combined p-values from random uniforms do not look uniform going up to n = 1e4. In comparison, the distributions of the combined p-values from the other methods do look uniform even for small n. Another problem is that in depending on a small subset of the total p-values, the combined p-value could exhibit more variance than methods which use all p-values.

One advantage is in the case mentioned above, with unbalanced p-values. If these are outside of the .05 region, these won’t affect the binomial method very much. For example:

 x = c(runif(1000),runif(100,.1,.2)) fishersMethod(x) [1] 0.01510909 sumPvalsMethod(x) [1] 0.0005628212 binomMethod(x) [1] 0.4642092