Mon - Fri : 08:00 AM - 5:00 PM
734.544-8038

The Bootstrap in Stata

The Bootstrap in Stata

///
Comment0

Stata has two options for bootstrapping. Most model estimation commands (.regress, .probit, .stcox, etc.) have the vce(bootstrap) option for estimating coefficient standard errors. There is additionally the .bootstrap command, which offers greater flexibility for when the user wants to bootstrap a more complex expression.

Take for example a regression based on hypothetical data measuring people’s feelings towards dogs and cats. The variable dogs used a scale from 0 to 100 to measure how much respondents liked dogs, with zero representing a strong dislike and 100 representing a strong affection. The cats variable used the same scale to tap attitudes towards cats. To load the data, type the following at the Stata prompt:

. use "http://www.methodsconsultants.com/data/pets.dta", replace

Regressing dogs on cats in the usual manner produces the following results:

. regress dogs cats

      Source |       SS       df       MS              Number of obs =     500
-------------+------------------------------           F(  1,   498) = 2577.59
       Model |  340067.683     1  340067.683           Prob > F      =  0.0000
    Residual |  65702.2593   498  131.932248           R-squared     =  0.8381
-------------+------------------------------           Adj R-squared =  0.8378
       Total |  405769.942   499  813.166216           Root MSE      =  11.486

------------------------------------------------------------------------------
        dogs |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cats |  -1.251126    .024643   -50.77   0.000    -1.299543   -1.202709
       _cons |   113.4563   1.314133    86.34   0.000     110.8744    116.0382
------------------------------------------------------------------------------

If the user is uncertain about the assumptions underlying the regression model, it is possible to use the vce(bootstrap) option to produce bias-corrected bootstrap confidence intervals around the coefficients. The default is to draw 50 bootstrap samples, but this can be changed using the reps option.

. regress dogs cats, vce(bootstrap, reps(500) bca seed(1))
(running regress on estimation sample)

(Some output suppressed)

Linear regression                               Number of obs      =       500
                                                Replications       =       500
                                                Wald chi2(1)       =   2927.19
                                                Prob > chi2        =    0.0000
                                                R-squared          =    0.8381
                                                Adj R-squared      =    0.8378
                                                Root MSE           =   11.4862

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
        dogs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cats |  -1.251126   .0231247   -54.10   0.000     -1.29645   -1.205803
       _cons |   113.4563   1.236664    91.74   0.000     111.0325    115.8801
------------------------------------------------------------------------------

The bca option tells Stata to calculate the acceleration for each statistic, which is necessary if the user wants to use postestimation commands to retrieve the bias-corrected and accelerated (BCa) confidence interval. The seed option sets the starting point of the pseudo-random number generator so that results can be perfectly replicated. If this option is not set results will vary somewhat each time the command is run. Note that Stata bootstraps from the sample rather than from the residuals (see “What is the bootstrap?”).

The .estat bootstrap postestimation command provides alternative bootstrap confidence intervals. The all option returns the normal-theory, percentile, bias-corrected, and bias-corrected and accelerated (BCa) intervals (the latter assuming the bca option was specified when the model was run).

. estat bootstrap, all

Linear regression                               Number of obs      =       500
                                                Replications       =       500

------------------------------------------------------------------------------
             |    Observed               Bootstrap
        dogs |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cats |  -1.2511261   .0011688   .02312466    -1.29645  -1.205803   (N)
             |                                      -1.294578  -1.206655   (P)
             |                                       -1.29638  -1.207123  (BC)
             |                                      -1.295857  -1.206898 (BCa)
       _cons |   113.45631  -.0317447   1.2366639    111.0325   115.8801   (N)
             |                                       111.0854   115.9437   (P)
             |                                       111.1052   115.9454  (BC)
             |                                       111.0854   115.9437 (BCa)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval
(BCa)  bias-corrected and accelerated confidence interval

Alternatively, the .bootstrap command is available. It is prefixed to other Stata commands to facilitate bootstrapping expressions based on the saved results. The following are equivalent means of getting bias-corrected intervals around the coefficients after 500 bootstrap iterations (some output suppressed):

. regress dogs cats, vce(bootstrap, reps(500) seed(1))
(running regress on estimation sample)

Bootstrap replications (500)



Linear regression                               Number of obs      =       500
                                                Replications       =       500
                                                Wald chi2(1)       =   2927.19
                                                Prob > chi2        =    0.0000
                                                R-squared          =    0.8381
                                                Adj R-squared      =    0.8378
                                                Root MSE           =   11.4862

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
        dogs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cats |  -1.251126   .0231247   -54.10   0.000     -1.29645   -1.205803
       _cons |   113.4563   1.236664    91.74   0.000     111.0325    115.8801
------------------------------------------------------------------------------

and

. bootstrap _b[cats] _b[_cons], reps(500) seed(1): regress dogs cats
(running regress on estimation sample)


Linear regression                               Number of obs      =       500
                                                Replications       =       500

      command:  regress dogs cats
        _bs_1:  _b[cats]
        _bs_2:  _b[_cons]

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -1.251126   .0231247   -54.10   0.000     -1.29645   -1.205803
       _bs_2 |   113.4563   1.236664    91.74   0.000     111.0325    115.8801
------------------------------------------------------------------------------

The second syntax is more typing, but it is also more general. The following returns a bootstrap confidence interval around the adjusted R-squared:

. bootstrap e(r2_a), reps(500) bca seed(1): regress dogs cats
(running regress on estimation sample)


Linear regression                               Number of obs      =       500
                                                Replications       =       500

      command:  regress dogs cats
        _bs_1:  e(r2_a)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .8377549   .0117002    71.60   0.000     .8148228    .8606869
------------------------------------------------------------------------------

. estat bootstrap, all

Linear regression                               Number of obs      =       500
                                                Replications       =       500

      command:  regress dogs cats
        _bs_1:  e(r2_a)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .83775488  -.0004866   .01170025    .8148228   .8606869   (N)
             |                                       .8138931   .8586175   (P)
             |                                       .8136719   .8579448  (BC)
             |                                       .8136447   .8576512 (BCa)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval
(BCa)  bias-corrected and accelerated confidence interval


More complicated expressions are possible if they are included in parentheses. The following bootstraps the predicted value on the dogs scale for somebody who scores a 25 on the cats scale:

. bootstrap pred=(_b[_cons] + _b[cats]*75), reps(500) seed(1): regress dogs cats
(running regress on estimation sample)

Bootstrap replications (500)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500

Linear regression                               Number of obs      =       500
                                                Replications       =       500

      command:  regress dogs cats
         pred:  _b[_cons] + _b[cats]*75

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        pred |   19.62185   .8012859    24.49   0.000     18.05136    21.19234
------------------------------------------------------------------------------

The .bootstrap command has several additional options which are described with detailed examples in Stata’s user manual [R], and they can all be incorporated into vce() syntax as well. Most of the options provide for more complicated resampling schemes. The strata option, for example, will cause the bootstrap to resample separately from each stratum. The cluster option will cause the resampling to take place on groups identified by an id variable. It is typically advisable to use the vce option when it is available, as the estimation command will already take into account these other features of the data.

Still have questions? Contact us!