The bootstrap is a tool for making statistical inferences when standard parametric assumptions are questionable. For example, it is well known that significance tests for regression coefficients may be misleading when the errors are not normally distributed. Another way of thinking about the bootstrap is that it is a method for computing confidence intervals around just about any statistic one could possibly want to estimate even when no formula exists for calculating a standard error. The term “bootstrap” is a reference to the notion of “pulling oneself up by the bootstraps” when the usual methods for ascertaining statistical significance do not apply (Efron and Tibshirani 1993: 5).
Traditional statistical inference is based on the assumption of drawing repeated samples from a population. A statistic (e.g. the mean or a regression coefficient) is assumed to be fixed in the population, but if a researcher were to draw multiple samples from the population the estimate would be a little different each time. The term sampling distribution refers to the distribution that would result from taking multiple samples from a population and re-estimating a statistic on each new set of observations. Most often the sampling distribution is assumed to be normal, and a statistic is declared to be “significant” if the 95% confidence interval (the area under the curve excluding the smallest and largest 2.5% of values) does not include zero.
When parametric assumptions are violated, or when no established formula exists for estimating standard errors (the standard deviation of the sampling distribution), analysts can turn to the bootstrap. Just like the sample was drawn from a set population, it is possible to treat the sample as a population and repeatedly draw observations from it. Each bootstrap sample is drawn with replacement and has the same sample size as the original. As an example, say a sample of ten individuals is drawn from a population and weighed for a study about dieting. The distribution of weights is displayed in the following table:
A researcher is interested in estimating the median weight and would also like to have a measure of uncertainty around that estimate. She therefore draws 50 samples with replacement from the observations in Table 1, with each new sample being equal in size to the original (N=10). The tables below show three different bootstrap samples that resulted from the resampling. Note that Person 1 appears twice in the first bootstrap sample but does not appear at all in the second, whereas Person 2 only appears in the second bootstrap sample.
The estimates resulting from B bootstrap iterations are then pooled together to construct a sampling distribution which can be used to make inferences.
To summarize, the most basic form of the bootstrap involves the following steps:
- Draw a sample of size N from a population.
- Draw B samples of size N with replacementfrom that sample.
- Estimate the statistic of interest θ for each of the B samples.
- Construct a sampling distribution from the estimates and use it to construct a confidence interval around θ.
Bootstrapping Regression Coefficients
A variation of the bootstrap is often used for estimating confidence intervals around regression coefficients. Take the simple regression model
It would be possible to proceed in the same manner outlined above and simply resample from the original sample. However, the motivation for the bootstrap is to simulate the random process underlying an observed outcome, whereas the α and β parameters are assumed to be fixed in the population. Because the only random component in the regression model is the ei, analysts sometimes choose to bootstrap on the residuals instead. This involves the following steps:
1. Estimate the model on the original sample and use the estimates to calculate predicted values and residuals.
2. Bootstrap from the vector of residuals and use these to construct new Y-values. That is, add a random quantity drawn from the set of residual values to each original Yobservation.
3. Regress these bootstrapped Y*s on the original Xi. The values of the dependent variable will be different in each bootstrap iteration but the independent variable observations are unchanged from the original sample.
4. Construct a confidence interval based on the resulting B estimates of the regression coefficients.
In the context of observational data the choice of which approach to take – resampling from the sample or resampling from the residuals – can be ambiguous. The reason is that researchers do not have complete control over the values which variables take on, so the notion of a fixed effect is not the same as it is in experimental designs (Mooney and Duval 1993: 17). There is thus generally some randomness in the coefficients as well, which may make it justifiable to use the simpler bootstrap method described in the previous section.
Bootstrap Confidence Intervals
Confidence intervals are of fundamental importance in statistical inference, as they indicate the precision of an estimate and the certainty with which an analyst can say an estimate is different from zero. Assuming that a statistic’s sampling distribution is normal, constructing a confidence interval is straightforward. Given a statisticθ, the 95% confidence interval is the following:
where is the estimated standard error of the statistic.
One way to construct a “normal-theory” confidence interval is therefore to plug in the bootstrap standard error derived from theBsamples for the estimated standard error in the above equation. However, there is little advantage to using this approach if the true sampling distribution is known to be normal. Also, there may be some statistics for which normal-theory intervals include nonsensical values. The correlation coefficient, for example, ranges from -1 to +1, yet plugging in an estimate for the standard error can lead to numbers outside these boundaries. Consequently it is common to rely on other intervals.
Percentile and Bias-Corrected Percentile Intervals
Normal-theory confidence intervals work by “chopping off” the tales of a normal distribution. A non-parametric analog (that is, an approach which does not rely on assuming normality) would be to order the B values of the statistic of interest estimated from each bootstrap sample and to remove the most extreme values. For example, it is possible to construct a 95% percentile interval based on 1000 bootstrap iterations by ordering the 1000 bootstrap estimates and removing the lowest 25 and highest 25 values. This guarantees that the interval will include only values that the statistic can plausibly take on.
The percentile method is deceptively simple, however, because it assumes that the bootstrap estimate is unbiased. That is, on average the bootstrap estimate is not different than the estimate from the original sample (which in turn is assumed to be an unbiased estimate of the population parameter). This will usually be an overly optimistic assumption, however, and so alternative methods for constructing percentile intervals have been proposed that make certain corrections to the percentile estimates. The result is what is called a “bias-corrected” interval, or alternatively a “bias-corrected and accelerated” interval (for when an additional adjustment is made based on what is called theacceleration). The boot package in R and the .bootstrap command in Stata make these adjustments for the user, so reporting bias-corrected intervals is no problem in practice.
Still have questions? Contact us!
Efron, Bradley and Robert J. Tibishirani. 1993. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC.
Mooney, Christopher Z. and Robert D. Duval. 1993. Bootstrapping: A Nonparametric Approach to Statistical Inference. Newbury Park, CA: Sage.