Splitting data

The caret package has a nice function for splitting up balanced subsets of data. Though I don’t see why I don’t get 3 rows out of 10 in this example. The p argument is defined as “the percentage of data that goes to training”.


d <- data.frame(x=rnorm(10), group=factor(c(1,1,1,2,2,2,3,3,3,3)))
d
            x group
1   1.0089900     1
2   0.4854706     1
3   1.7083259     1
4  -1.3362274     2
5   1.4905259     2
6   1.6451234     2
7   1.0361174     3
8   0.2369341     3
9  -2.0043264     3
10  1.4361718     3
library(caret)
d[createDataPartition(d$group, p=3/10)$Resample1,]
            x group
3   1.7083259     1
4  -1.3362274     2
8   0.2369341     3
10  1.4361718     3

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s