Sparse matrix operations in python and R
I’m comparing two libraries for working with sparse matrices: scipy.sparse for python and Matrix for R.
I create a matrix of 10 million rows x 100 columns, filled randomly with 100 million standard normals (which overlap).
Python wins matrix creation, they are about equal for multiplication (XtX), python wins also taking row means, slicing 25 columns and then binding the columns together.
.
.
> python sparse.py -m 1e7 -n 1e2 -l 1e8 create: 19.51 multiply: 36.08 row means: 2.27 column slices: 4.88 hstack: 1.56 > R --no-save --slave '--args m=1e7 n=1e2 l=1e8' < sparse.R create: 40.548 multipy: 40.034 row means: 9.327 column slices: 6.755 cbind: 20.167
.
.
Doubling the rows and columns increases the speed advantage of python. Matrix creation and column slicing is now 4 times faster with python.
.
.
> python sparse.py -m 2e7 -n 2e2 -l 4e8 create: 83.1 multiply: 197.5 row means: 8.53 column slices: 9.26 hstack: 3.16 > R --no-save --slave '--args m=2e7 n=2e2 l=4e8' < sparse.R create: 258.474 multipy: 211.625 row means: 38.109 column slices: 41.219 cbind: 43.592
.
.
The scripts are:
.
.
###############
# sparse.py
from scipy import sparse
import time
import numpy
import scipy
import getopt
import sys
try:
opts, args = getopt.getopt(sys.argv[1:], 'm:n:l:')
except getopt.GetoptError, err:
print str(err)
sys.exit(2)
for o,a in opts:
if o == '-m':
m = int(float(a))
if o == '-n':
n = int(float(a))
if o == '-l':
l = int(float(a))
i = numpy.random.randint(0,m,l)
j = numpy.random.randint(0,n,l)
x = numpy.random.randn(l)
t0 = time.clock(); y = scipy.sparse.coo_matrix((x,(i,j)),shape=(m,n)).tocsc(); print "create:", time.clock() - t0
t0 = time.clock(); z = y.transpose(copy=True) * y; print "multiply:", time.clock() - t0
t0 = time.clock(); rms = y.mean(1); print "row means:", time.clock() - t0
t0 = time.clock(); ys = [y.getcol(k) for k in range(25)]; print "column slices:", time.clock() - t0
t0 = time.clock(); a = scipy.sparse.hstack(ys); print "hstack:", time.clock() - t0
###############
.
.
.
.
###############
# sparse.R
args=(commandArgs(TRUE))
if (length(args)==0) {
print("No arguments supplied.")
} else {
for (i in 1:length(args)) {
eval(parse(text=args[[i]]))
}
}
i <- sample(m,l,TRUE)
j <- sample(n,l,TRUE)
x <- rnorm(l)
library(lattice,warn=FALSE,verbose=FALSE)
library(Matrix,warn=FALSE,verbose=FALSE)
mytimer <- function(...) as.numeric(system.time(...)[3])
cat("create:",mytimer({y <- sparseMatrix(x=x,i=i,j=j,dims=c(m,n))}),"\n")
cat("multipy:",mytimer({z <- crossprod(y)}),"\n")
cat("row means:",mytimer({rms <- rowMeans(y)}),"\n")
cat("column slices:",mytimer({ys <- lapply(1:25,function(k) y[,k,drop=FALSE])}),"\n")
cat("cbind:",mytimer({a <- do.call(cBind,ys)}),"\n")
###############
leave a comment