# Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test has an interesting derivation: to test if a sample of independent observations comes from a certain distribution, you line up the empirical cumulative distribution function with the theoretical CDF and find the greatest vertical distance between the two. If the sample comes from the specified distribution, then the max distance times the square root of the sample size has the same distribution as the maximum of the absolute value of the Brownian bridge, the Kolmogorov distribution. The Brownian bridge looks like Brownian motion/Weiner process except it is tethered at two points.  The test makes intuitive sense: under the null, the empirical and theoretical CDF’s are tethered at 0 and 1, and have some random variation in between.

If test statistic (distance * sqrt(n)) is above a certain quantile of the Kolmogorov distribution, then you reject the null hypothesis, and assume the sample is from a different distribution.  There is a simple alternative KS-test to see if two observed samples are from the same distribution (like the Mann-Whitney-Wilcoxon test).

```# get 10 random normal observations x <- rnorm(10) plot(ecdf(x),verticals=TRUE,do.points=FALSE,col.hor="red",col.vert="red",xlab="",xlim=c(-3,3),lwd=2,main="empirical cdf") lines(-30:30/10,pnorm(-30:30/10))```

```ks.test(x,"pnorm",exact=FALSE) test <- ks.test(x,"pnorm",exact=FALSE) D <- test\$statistic```

One-sample Kolmogorov-Smirnov test
D = 0.2553, p-value = 0.5322
alternative hypothesis: two-sided

```# take a look at the Brownian bridge library(sde) plot(BBridge(N=5000)) abline(h=0,lty=2)```

```# try to approximate the Kolmogorov distribution # with 1000 Brownian bridges K <- replicate(1000,max(abs(BBridge(N=10000)))) plot(ecdf(K),verticals=TRUE,do.points=FALSE,main="distribution of K") f <- function(x,i) { exp(-(2*i-1)^2*pi^2/(8*x^2)) } kolm <- function(x) { sqrt(2*pi)/x*(f(x,1)+f(x,2)+f(x,3)+f(x,4)) } lines(1:200/100,kolm(1:200/100),col="red")```

```# monte carlo approximation 1 - sum(K < D*sqrt(10))/1000 0.531 # numerical approximation 1 - kolm(D*sqrt(10)) 0.5321873```