Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test has an interesting derivation: to test if a sample of independent observations comes from a certain distribution, you line up the empirical cumulative distribution function with the theoretical CDF and find the greatest vertical distance between the two. If the sample comes from the specified distribution, then the max distance times the square root of the sample size has the same distribution as the maximum of the absolute value of the Brownian bridge, the Kolmogorov distribution. The Brownian bridge looks like Brownian motion/Weiner process except it is tethered at two points.  The test makes intuitive sense: under the null, the empirical and theoretical CDF’s are tethered at 0 and 1, and have some random variation in between.

If test statistic (distance * sqrt(n)) is above a certain quantile of the Kolmogorov distribution, then you reject the null hypothesis, and assume the sample is from a different distribution.  There is a simple alternative KS-test to see if two observed samples are from the same distribution (like the Mann-Whitney-Wilcoxon test).

# get 10 random normal observations
x <- rnorm(10)
plot(ecdf(x),verticals=TRUE,do.points=FALSE,col.hor="red",col.vert="red",xlab="",xlim=c(-3,3),lwd=2,main="empirical cdf")
lines(-30:30/10,pnorm(-30:30/10))

ecdf1

ks.test(x,"pnorm",exact=FALSE)
test <- ks.test(x,"pnorm",exact=FALSE)
D <- test$statistic

One-sample Kolmogorov-Smirnov test
D = 0.2553, p-value = 0.5322
alternative hypothesis: two-sided

# take a look at the Brownian bridge
library(sde)
plot(BBridge(N=5000))
abline(h=0,lty=2)

ecdf2

# try to approximate the Kolmogorov distribution
# with 1000 Brownian bridges
K <- replicate(1000,max(abs(BBridge(N=10000))))
plot(ecdf(K),verticals=TRUE,do.points=FALSE,main="distribution of K")
f <- function(x,i) { exp(-(2*i-1)^2*pi^2/(8*x^2)) }
kolm <- function(x) { sqrt(2*pi)/x*(f(x,1)+f(x,2)+f(x,3)+f(x,4)) }
lines(1:200/100,kolm(1:200/100),col="red")

ecdf3
# monte carlo approximation
1 - sum(K < D*sqrt(10))/1000
0.531
# numerical approximation
1 - kolm(D*sqrt(10))
0.5321873

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s