# Sparse matrix operations in python and R

I’m comparing two libraries for working with sparse matrices: scipy.sparse for python and Matrix for R.

I create a matrix of 10 million rows x 100 columns, filled randomly with 100 million standard normals (which overlap).

Python wins matrix creation, they are about equal for multiplication (XtX), python wins also taking row means, slicing 25 columns and then binding the columns together.
.
.

```> python sparse.py -m 1e7 -n 1e2 -l 1e8
create: 19.51
multiply: 36.08
row means: 2.27
column slices: 4.88
hstack: 1.56

> R --no-save --slave '--args m=1e7 n=1e2 l=1e8' < sparse.R
create: 40.548
multipy: 40.034
row means: 9.327
column slices: 6.755
cbind: 20.167
```

.
.

Doubling the rows and columns increases the speed advantage of python. Matrix creation and column slicing is now 4 times faster with python.
.
.

```> python sparse.py -m 2e7 -n 2e2 -l 4e8
create: 83.1
multiply: 197.5
row means: 8.53
column slices: 9.26
hstack: 3.16

> R --no-save --slave '--args m=2e7 n=2e2 l=4e8' < sparse.R
create: 258.474
multipy: 211.625
row means: 38.109
column slices: 41.219
cbind: 43.592
```

.
.
The scripts are:
.
.

```###############
# sparse.py
from scipy import sparse
import time
import numpy
import scipy
import getopt
import sys
try:
opts, args = getopt.getopt(sys.argv[1:], 'm:n:l:')
except getopt.GetoptError, err:
print str(err)
sys.exit(2)
for o,a in opts:
if o == '-m':
m = int(float(a))
if o == '-n':
n = int(float(a))
if o == '-l':
l = int(float(a))
i = numpy.random.randint(0,m,l)
j = numpy.random.randint(0,n,l)
x = numpy.random.randn(l)
t0 = time.clock(); y = scipy.sparse.coo_matrix((x,(i,j)),shape=(m,n)).tocsc(); print "create:", time.clock() - t0
t0 = time.clock(); z = y.transpose(copy=True) * y; print "multiply:", time.clock() - t0
t0 = time.clock(); rms = y.mean(1); print "row means:", time.clock() - t0
t0 = time.clock(); ys = [y.getcol(k) for k in range(25)]; print "column slices:", time.clock() - t0
t0 = time.clock(); a = scipy.sparse.hstack(ys); print "hstack:", time.clock() - t0
###############
```

.
.
.
.

```###############
# sparse.R
args=(commandArgs(TRUE))
if (length(args)==0) {
print("No arguments supplied.")
} else {
for (i in 1:length(args)) {
eval(parse(text=args[[i]]))
}
}
i <- sample(m,l,TRUE)
j <- sample(n,l,TRUE)
x <- rnorm(l)
library(lattice,warn=FALSE,verbose=FALSE)
library(Matrix,warn=FALSE,verbose=FALSE)
mytimer <- function(...) as.numeric(system.time(...)[3])
cat("create:",mytimer({y <- sparseMatrix(x=x,i=i,j=j,dims=c(m,n))}),"\n")
cat("multipy:",mytimer({z <- crossprod(y)}),"\n")
cat("row means:",mytimer({rms <- rowMeans(y)}),"\n")
cat("column slices:",mytimer({ys <- lapply(1:25,function(k) y[,k,drop=FALSE])}),"\n")
cat("cbind:",mytimer({a <- do.call(cBind,ys)}),"\n")
###############
```