# Binomial GLM for ratios of read counts

For certain sequencing experiments (e.g. methylation data), one might end up with a ratio of read counts at a certain location satisfying a given property (e.g. ‘is methylated’) and want to test if this ratio is significantly associated with a given variable, x.

One way to proceed would be a linear regression of ratio ~ x. However the case when 100 reads cover a nucleotide is not statistically equivalent to the case when 2 reads cover a nucleotide. The binomial probabilities will become increasingly spiked at p*n as the number of reads n increases. So the case with 100 reads covering gives us more information than the case with 2 reads covering.

Here is a bit of code for using the glm() function in R with the binomial distribution with weights representing the covering reads.

```n <- 100
# random poisson number of observations (reads)
# make a N(0,2) predictor variable x
x <- rnorm(n,0,2)
# x will be negatively correlated with the target variable y
beta <- -1
# through a sigmoid curve mapping x*beta to probabilities in [0,1]
p <- exp(x*beta)/(1 + exp(x*beta))
# binomial distribution from the number of observations (reads)

# plot the successes (y) over the total number of trials (reads)
# and order the x-axis by the predictor variable x
par(mfrow=c(2,1),mar=c(2,4.5,1,1))
o <- order(x)
points(y[o],col="red",type="h",lwd=2)
points(y[o],col="red",type="p",pch=20)
par(mar=c(4.5,4.5,0,1))
# more clear to see the relationship
# plot just the ratio