In looking at sequential data (e.g. time-series or genomic data), any inference comparing different sequences needs to take into account local correlations within a sequence. For example, you might want to know how often is it raining in two cities at the same time, and if this is more than expected by chance. But it is more likely to rain on a given day if it was raining the day before, and this dependence will change the distribution of overlap expected by chance. In stochastics, this is a question of whether the process is ‘stationary‘.

One way out of the problem of estimating the distribution of overlap of two process by chance is the block bootstrap. Instead of randomly shifting features in the sequence (what I call naive permutation), you randomly build new sequences from large blocks of the original sequence. Then a distribution can be formed of overlap of features by chance. Here is a single bootstrap sample (top sequence) constructed in this manner from the data (bottom sequence).

Here are histograms demonstrating various ways of estimating the null distribution of overlaps between two sequences, with the true null on top (the clusters of features are of size 20). The block bootstrap can do a much better job of estimating the mean and variance of the null distribution. Knowing how large of a block to define is another problem, and Politis and Romano (below) explore the effect of using randomly sized blocks over fixed size blocks.

a reference for this problem in genomic inference is: Peter Bickel Boley N, Brown JB, Huang H and Zhang NR, Non-Parametric Methods for Genomic Inference, 2010, http://www.stat.berkeley.edu/~bickel/Bickel%20et%20al%202010%20AAS.pdf

and a more general reference is Dimitris N. Politis and Joseph P. Romano, The Stationary Bootstrap, Journal of the American Statistical Association, Vol. 89, No. 428 (Dec., 1994), pp. 1303-1313, http://www.jstor.org/discover/10.2307/2290993?uid=3737864&uid=2129&uid=2&uid=70&uid=4&sid=21101109652181

The R code for this example is here.

### Like this:

Like Loading...

*Related*

your link to your R code doesn’t work… and this is exactly what I have been trying to do for the past week! Help!! 🙂

Thank you!