Points and line ranges
Two ways of plotting a grid of points and line ranges. I’m coming around to ggplot2. I recommend skimming the first few chapters of the book to understand what is going on – but it only takes about 30 min or so to understand enough to make basic plots.
m <- 10
k <- 3
d <- data.frame(x=factor(rep(1:k,m)), y=rnorm(m*k), z=rep(1:m,each=k))
d$ymax <- d$y + 1
d$ymin <- d$y - 1
# pretty simple
library(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ymin, ymax=ymax))
p + geom_pointrange() + theme_bw() + facet_wrap(~ z)
# messy
par(mfrow=c(3,4), mar=c(3,3,2,1))
for (i in 1:m) {
with(d[d$z == i,], {
plot(as.numeric(x), y, main=i, xlim=c(0,k+1), ylim=c(-3,3), pch=20, xaxt="n")
axis(1,at=1:k,1:k)
segments(as.numeric(x),ymin,as.numeric(x),ymax)
})
}
Plotting hclust
After many years I’ve finally worked out the x and y coordinates of the points in plot.hclust.
hang <- 0.07 hc <- hclust(dist) plot(hc) pt.heights <- c(hc$height[hc$merge[,1] < 0],hc$height[hc$merge[,2] < 0])[order(-1 * c(hc$merge[,1][hc$merge[,1] < 0],hc$merge[,2][hc$merge[,2] < 0]))] points(1:length(hc$order), pt.heights[hc$order] - hang)
German nouns and gender
I’m working on a presentation about classification of strings, and using 240,000 German nouns as an example dataset.

R one liner: Correlation matrices
I have seen many plots of correlation matricies using rainbow or heat colors and therefore not indicating the zero crossing. E.g.

Instead I would like to see this:
library(fields)
cormat = cor(X)
image.plot(cormat,zlim=c(-1,1),col=colorRampPalette(c("red","white","green"))(49))

R one liner: histogram for integers
Here is a function for making better histograms for integers.
For example, you have
x = c(1,1,1,1,1,2,2,2,2,6,6,6,6,7,7,7,7,7,8,8)
If you call hist(x) you get the 1s and 2s in the same bin.
int.hist = function(x,ylab="Frequency",...) {
barplot(table(factor(x,levels=min(x):max(x))),space=0,xaxt="n",ylab=ylab,...);axis(1)
}
Then trying int.hist(x):

Hist
I find myself always rewriting this code, so instead I will post it.

It is code to make a solid histogram with a thick outline, which is useful to overlay two empirical distributions or if you want to change the x axis to log for example.
x = rnorm(1000)
h = hist(x,breaks=20,plot=FALSE)
brks = rep(h$breaks,each=2)
counts = c(0,rep(h$counts,each=2),0)
plot(brks,counts,type="n",xlab="Value",ylab="Frequency",main="Histogram")
lines(brks,counts,lwd=2)
polygon(brks,counts,col=rgb(1,1,0,.5))
Area vs length
I was listening to the TAL/Propublica report on the hedge fund Magnetar and went to their website which shows bubbles of the size of CDOs which Magnetar invested in vs. the market total of similar CDOs.
Long ago, Cleveland and McGill showed that people are better at comparing length than at comparing areas, but still you’ll see bubbles (saving space perhaps). Below you can compare two presentations of the same data.
Here is one row from the original chart:
And the same data as a stacked barchart:
Also, it seems the black circle is the total size of the CDO which Magnetar had a part in creating, not the actual amount that Magnetar invested, but the legend has a black circle with just the word ‘Magnetar’ printed next to it.


leave a comment