Bootstrap-t confidence intervals

There are multiple methods to generate a bootstrap confidence interval.  The simplest in terms of implementation is a percentile bootstrap interval: generate B bootstrap samples of the original data and calculate B statistics from these samples. Then take the alpha/2 and 1-alpha/2 quantiles of the statistic to form a confidence interval of level alpha. However, this interval does not behave as well as other intervals, as we will show:

Take 20 random exponential variables with mean 3.  In R this looks like:

x = rexp(20,rate=1/3)

Then generate B=1000 bootstrap samples of x, and calculate the mean for each bootstrap sample.

s = numeric(B)
for (j in 1:B) {
boot = sample(n,replace=TRUE)
s[j] = mean(x[boot])

Then, for an alpha = .05 / 95% confidence interval, look at the .025 and .975 quantiles of the bootstrap statistics in the vector s: = quantile(s,c(.025,.975))

If I repeat this process from the start (including drawing a new x of 20 random exponential variables of mean 3) I can see how often the intervals actually contain the true mean. 11 of the intervals do not contain the true mean 3.  On average we would expect 5 if these are 95% confidence intervals.  If I repeat this 1000 times, I get 88.4% of the intervals containing the true mean.

In An Introduction to the Bootstrap by Efron and Tibshirani, the authors write,

The quantity (theta-hat – theta)/se-hat is called an approximate pivot: this means that its distribution is approximately the same for each value of theta….

Some elaborate theory shows that in large samples the coverage of the bootstrap-t interval tends to be closer to the desired level than the coverage of the standard interval or the interval based on the t table….

Notice also that the normal and t percentage points are symmetric about zero, and as a consequence the resulting intervals are symmetric about the point estimate theta-hat.  In contrast, the bootstrap-t percentiles can be asymmetric about 0, leading to intervals which are longer on the left or right.  This asymmetry represents an important part of the improvement in coverage it enjoys.

Note the authors here are not comparing the bootstrap-t interval to the bootstrap percentile method. Also, they add a caveat: “The bootstrap-t method, at least in its simple form, cannot be trusted for more general problems, like setting a confidence interval for a correlation coefficient.”

But it works well for calculating the mean, as in this case. This time we calculate a pivotal quantity as the bootstrapped statistic. For the vector x of random exponential variables of mean 3, we first calculate the mean and standard deviation of the original dataset:

x = rexp(n,rate=1/true.mean)
mean.x = mean(x)
sd.x = sd(x)

Then for each bootstrap sample, calculate the difference between the mean of the bootstrap data and the original data, divided by the standard deviation of the bootstrap (note that there may be better estimators for the denominator here).

z = numeric(B)
for (j in 1:B) {
boot = sample(n,replace=TRUE)
z[j] = (mean(x[boot]) - mean.x)/sd(x[boot])

Then to form a confidence interval, take quantiles of our bootstrapped statistic, multiply by the standard deviation of the original data and substract this from the mean of the original data: = mean.x - sd.x * quantile(z,c(.975,.025))

If I do this 1000 times, I get 95.1% of the intervals containing the true mean. The mean interval size has increased, from 2.4 for the simple intervals to 3.2 for the bootstrap-t intervals.

Here is an R script for this simulation.


6 thoughts on “Bootstrap-t confidence intervals”

  1. Hi there,

    I couldn’t find a “contact us” page, so I am writing to you this massage here:

    I run the service R-bloggers:
    An aggregator of R related articles, from blogs.

    And wanted to encourage you to join at:

    The idea behind the project is to share readers in order to gain readers: R-bloggers already has over 800 RSS subscribers (that are growing everyday).

    I built it in order to find all the R bloggers out there. So far I found over 45 bloggers, which also agreed to add there feed (and some to give a link back and post about it).

    And would love it if you might agree to join as well.

    Feel free to erase this comment if it clutters the blog too much.

    All the best,

  2. Good article, question below — you are using, SD — but it should be SE?

    se.x = sd(x) / sqrt(len(n))

    rather than ?

    sd.x = sd(x)

  3. Hey mike, I’m wondering how long such a script takes you? Bootstrap T with 1000 bootstraps and 1000 nested bootstraps. I’m running a Bootstrap T to get a confidence interval for a logistic regression. Its been running for 2 weeks!!


    for(i in 1:1000)
    boot<-forbootframe[sample(297, 297,replace=TRUE), ]
    bootlogit<-glm(Case~gender+allele+Ever+allele*Ever, family=binomial(logit),data=boot)

    for(u in 1:1000)
    nestboot<-boot[297, 297,replace=TRUE), ]
    bootlogit<-glm(Case~gender+allele+Ever+allele*Ever, family=binomial(logit),data=nestboot)



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s