It’s great when you can easily reproduce statistical analysis in a paper because 1) the authors made their data available and their methods clear, and 2) the existence of open source software for bioinformatics like Bioconductor. I wanted to use the data from a July 2010 paper in Genome Research: MicroRNA, mRNA, and protein expression link development and aging in human and macaque brain (Somel, M.). The article mentions the data was submitted to the NCBI Gene Expression Omnibus ( under series accession no. GSE18069. Going to this page then you can find the mRNA expression at series number GSE17757.

e = getGEO("GSE17757")[[1]]
[1] 13317 51

So here’s the first figure in the paper, showing the first two principal components of mRNA expression separate the development and aging phases of the 23 humans.

The first two principal components of mRNA … expression in human and rhesus macaque brains. The analysis was performed by singular value decomposition, using the ‘‘prcomp’’ function in the R ‘‘stats’’ package, with each gene scaled to unit variance before analysis.

nas = apply(exprs(e[,pData(e)$organism_ch=="Homo sapiens"]),1,function(x) sum(
e2 = e[nas==0,pData(e)$organism_ch=="Homo sapiens"]
years = as.numeric(sub("age: (\\d+) day[s]*", "\\1", as.character(pData(e2)$characteristics_ch1.1)))/365
batch = pData(e2)$characteristics_ch1.3 == "batch: human batch 2"
exprs(e2)[,batch] = t(scale(t(exprs(e2)[,batch])))
exprs(e2)[,!batch] = t(scale(t(exprs(e2)[,!batch])))
pc = prcomp(t(exprs(e2)))


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s