Poisson regression

In trying to explain generalized linear models, I often say something like: GLMs are very similar to linear models but with different domains for the target y, e.g. positive numbers, outcomes in {0,1}, non-negative integers, etc. This explanation bypasses the more interesting point though, that the optimization problem for fitting the coefficients is totally different, after applying the link function.

lm_vs_glm

This can be seen by comparing the coefficients from a linear regression of log counts to those from a Poisson regression. For some cases, the fitted lines are quite similar, however they diverge if you introduce outliers. A casual explanation here would be that the Poisson likelihood is thrown off more by high counts than by low counts; the high count pulls up the expected value for x=2 in the second plot, but the low count does not substantially pull down the expected value for x=3 in the third plot.

n <- 20
x <- rep(c(2,3),each=n/2)
y <- rpois(n,lambda=exp(x))
lmfit <- lm(log(y) ~ x)
glmfit <- glm(y ~ x, family="poisson")
par(mfrow=c(1,3))
xlim <- c(1.5,3.5)
plot(x,log(y),xlim=xlim)
abline(coef(lmfit),col="red")
abline(coef(glmfit),col="blue")
legend("topleft",c("lm","glm"),col=c("red","blue"),lty=1)
y[1] <- 50
lmfit <- lm(log(y) ~ x)
glmfit <- glm(y ~ x, family="poisson")
plot(x,log(y),xlim=xlim)
abline(coef(lmfit),col="red")
abline(coef(glmfit),col="blue")
y <- rpois(n,lambda=exp(x))
y[n] <- 2
lmfit <- lm(log(y) ~ x)
glmfit <- glm(y ~ x, family="poisson")
plot(x,log(y),xlim=xlim)
abline(coef(lmfit),col="red")
abline(coef(glmfit),col="blue")
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s