Sparse matrix operations in python and R

I’m comparing two libraries for working with sparse matrices: scipy.sparse for python and Matrix for R.

I create a matrix of 10 million rows x 100 columns, filled randomly with 100 million standard normals (which overlap).

Python wins matrix creation, they are about equal for multiplication (XtX), python wins also taking row means, slicing 25 columns and then binding the columns together.
.
.

> python sparse.py -m 1e7 -n 1e2 -l 1e8
create: 19.51
multiply: 36.08
row means: 2.27
column slices: 4.88
hstack: 1.56

> R --no-save --slave '--args m=1e7 n=1e2 l=1e8' < sparse.R
create: 40.548
multipy: 40.034
row means: 9.327
column slices: 6.755
cbind: 20.167

.
.

Doubling the rows and columns increases the speed advantage of python. Matrix creation and column slicing is now 4 times faster with python.
.
.

> python sparse.py -m 2e7 -n 2e2 -l 4e8
create: 83.1
multiply: 197.5
row means: 8.53
column slices: 9.26
hstack: 3.16

> R --no-save --slave '--args m=2e7 n=2e2 l=4e8' < sparse.R
create: 258.474
multipy: 211.625
row means: 38.109
column slices: 41.219
cbind: 43.592

.
.
The scripts are:
.
.

###############
# sparse.py
from scipy import sparse
import time
import numpy
import scipy
import getopt
import sys
try:
    opts, args = getopt.getopt(sys.argv[1:], 'm:n:l:')
except getopt.GetoptError, err:
    print str(err)
    sys.exit(2)
for o,a in opts:
    if o == '-m':
        m = int(float(a))
    if o == '-n':
        n = int(float(a))
    if o == '-l':
        l = int(float(a))
i = numpy.random.randint(0,m,l)
j = numpy.random.randint(0,n,l)
x = numpy.random.randn(l)
t0 = time.clock(); y = scipy.sparse.coo_matrix((x,(i,j)),shape=(m,n)).tocsc(); print "create:", time.clock() - t0
t0 = time.clock(); z = y.transpose(copy=True) * y; print "multiply:", time.clock() - t0
t0 = time.clock(); rms = y.mean(1); print "row means:", time.clock() - t0
t0 = time.clock(); ys = [y.getcol(k) for k in range(25)]; print "column slices:", time.clock() - t0
t0 = time.clock(); a = scipy.sparse.hstack(ys); print "hstack:", time.clock() - t0
###############

.
.
.
.

###############
# sparse.R
args=(commandArgs(TRUE))
if (length(args)==0) {
  print("No arguments supplied.")
} else {
  for (i in 1:length(args)) {
    eval(parse(text=args[[i]]))
  }
}
i <- sample(m,l,TRUE)
j <- sample(n,l,TRUE)
x <- rnorm(l)
library(lattice,warn=FALSE,verbose=FALSE)
library(Matrix,warn=FALSE,verbose=FALSE)
mytimer <- function(...) as.numeric(system.time(...)[3])
cat("create:",mytimer({y <- sparseMatrix(x=x,i=i,j=j,dims=c(m,n))}),"\n")
cat("multipy:",mytimer({z <- crossprod(y)}),"\n")
cat("row means:",mytimer({rms <- rowMeans(y)}),"\n")
cat("column slices:",mytimer({ys <- lapply(1:25,function(k) y[,k,drop=FALSE])}),"\n")
cat("cbind:",mytimer({a <- do.call(cBind,ys)}),"\n")
###############
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s