# Mike Love’s blog

## Points and line ranges

Posted in statistics, visualization by mikelove on January 13, 2013

Two ways of plotting a grid of points and line ranges. I’m coming around to ggplot2. I recommend skimming the first few chapters of the book to understand what is going on – but it only takes about 30 min or so to understand enough to make basic plots.

```m <- 10
k <- 3
d <- data.frame(x=factor(rep(1:k,m)), y=rnorm(m*k), z=rep(1:m,each=k))
d\$ymax <- d\$y + 1
d\$ymin <- d\$y - 1

# pretty simple
library(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ymin, ymax=ymax))
p + geom_pointrange() + theme_bw() + facet_wrap(~ z)

# messy
par(mfrow=c(3,4), mar=c(3,3,2,1))
for (i in 1:m) {
with(d[d\$z == i,], {
plot(as.numeric(x), y, main=i, xlim=c(0,k+1), ylim=c(-3,3), pch=20, xaxt="n")
axis(1,at=1:k,1:k)
segments(as.numeric(x),ymin,as.numeric(x),ymax)
})
}
```

## Plotting hclust

Posted in statistics, visualization by mikelove on August 8, 2012

After many years I’ve finally worked out the x and y coordinates of the points in plot.hclust.

```hang <- 0.07
hc <- hclust(dist)
plot(hc)
pt.heights <- c(hc\$height[hc\$merge[,1] < 0],hc\$height[hc\$merge[,2] < 0])[order(-1 * c(hc\$merge[,1][hc\$merge[,1] < 0],hc\$merge[,2][hc\$merge[,2] < 0]))]
points(1:length(hc\$order), pt.heights[hc\$order] - hang)
```

## German nouns and gender

Posted in visualization by mikelove on May 11, 2012

I’m working on a presentation about classification of strings, and using 240,000 German nouns as an example dataset.

## R one liner: Correlation matrices

Posted in statistics, visualization by mikelove on March 31, 2011

I have seen many plots of correlation matricies using rainbow or heat colors and therefore not indicating the zero crossing. E.g.

Instead I would like to see this:

```library(fields) cormat = cor(X) image.plot(cormat,zlim=c(-1,1),col=colorRampPalette(c("red","white","green"))(49))```

## R one liner: histogram for integers

Posted in statistics, visualization by mikelove on March 30, 2011

Here is a function for making better histograms for integers.

For example, you have

`x = c(1,1,1,1,1,2,2,2,2,6,6,6,6,7,7,7,7,7,8,8)`

If you call `hist(x)` you get the 1s and 2s in the same bin.

```int.hist = function(x,ylab="Frequency",...) { barplot(table(factor(x,levels=min(x):max(x))),space=0,xaxt="n",ylab=ylab,...);axis(1) }```

Then trying `int.hist(x)`:

## Hist

Posted in statistics, visualization by mikelove on January 4, 2011

I find myself always rewriting this code, so instead I will post it.

It is code to make a solid histogram with a thick outline, which is useful to overlay two empirical distributions or if you want to change the x axis to log for example.

```x = rnorm(1000) h = hist(x,breaks=20,plot=FALSE) brks = rep(h\$breaks,each=2) counts = c(0,rep(h\$counts,each=2),0) plot(brks,counts,type="n",xlab="Value",ylab="Frequency",main="Histogram") lines(brks,counts,lwd=2) polygon(brks,counts,col=rgb(1,1,0,.5)) ```

## Area vs length

Posted in visualization by mikelove on April 22, 2010

I was listening to the TAL/Propublica report on the hedge fund Magnetar and went to their website which shows bubbles of the size of CDOs which Magnetar invested in vs. the market total of similar CDOs.

Long ago, Cleveland and McGill showed that people are better at comparing length than at comparing areas, but still you’ll see bubbles (saving space perhaps).  Below you can compare two presentations of the same data.

Here is one row from the original chart:

And the same data as a stacked barchart:

Also, it seems the black circle is the total size of the CDO which Magnetar had a part in creating, not the actual amount that Magnetar invested, but the legend has a black circle with just the word ‘Magnetar’ printed next to it.