Mon - Fri : 08:00 AM - 5:00 PM
734.544-8038

Estimating Logistic Regression Models in Stata

Estimating Logistic Regression Models in Stata

///
Comment0

Note: For a fuller treatment, download our online seminar Maximum Likelihood Estimation for Categorical Dependent Variables.

Say you are interested in predicting whether somebody is a fan of Justin Bieber according to the amount of beer they have consumed as well as their gender. Bieber fever is coded 1 if the respondent is a fan and zero otherwise. Because the dependent variable is dichotomous, the appropriate method is logistic regression.

The logistic regression (or logit) model is linear in the log odds of the dependent variable.

logodds-117

Most people don’t think in terms of log odds, so it’s common to interpret the results either by exponentiating coefficients to yield odds ratios, or else by computing predicted probabilities. An odds ratio greater than one means that an increase in X leads to an increase in the odds that the dependent variable equals one; an odds ratio less than one means that the odds are decreasing. The predicted probabilities can be calculated using the formula for the cdf to the standard logistic distribution:

ppform-118

In Stata, there are two commands for fitting a logistic regression model. (Actually, there are more, but we won’t discuss the .glm and .ml commands here.) The two commands are .logit and .logistc. They estimate exactly the same model, but they report different output. The .logit command reports the untransformed beta coefficients. The .logistic command reports odds ratios, equal to e^β.

The syntax for the logit command is the following:

. logit bieber beer gender

This produces the following output:


Iteration 0:   log likelihood = -69.234697  
Iteration 1:   log likelihood = -20.370116  
Iteration 2:   log likelihood = -19.384329  
Iteration 3:   log likelihood = -19.366223  
Iteration 4:   log likelihood = -19.366175  
Iteration 5:   log likelihood = -19.366175  

Logistic regression                               Number of obs   =        100
                                                  LR chi2(2)      =      99.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
      bieber |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        beer |   1.885463   .4373108     4.31   0.000      1.02835    2.742576
      gender |   3.379684   1.179796     2.86   0.004     1.067326    5.692041
       _cons |  -11.73551   2.758776    -4.25   0.000    -17.14261   -6.328409
------------------------------------------------------------------------------


Because the coefficient is positive, each additional beer increases the log odds of having Bieber fever by 1.885. In addition, because gender is coded such that males = 1 and females = 0, the log odds of having Bieber fever is higher for males.

These coefficients are the untransformed betas from the linear model of the log odds. It is possible to return the predicted probability of, say, a male that has consumed 4 beers as follows:

That is, the probability that a male having consumed four beers has Bieber fever is .307.

It is possible to recover predicted probabilities for each person in the sample using the .predict command following model estimation. This command has several options, but the default is to calculate predicted probabilities.

. predict p

This creates a new variable, p, containing the predicted probabilities.

To get odds ratios, use the .logistic command.


. logistic bieber beer gender

Logistic regression                               Number of obs   =        100
                                                  LR chi2(2)      =      99.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
      bieber | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        beer |   6.589404   2.881618     4.31   0.000     2.796447    15.52694
      gender |   29.36148   34.64055     2.86   0.004     2.907595    296.4981
       _cons |   8.00e-06   .0000221    -4.25   0.000     3.59e-08    .0017849
------------------------------------------------------------------------------

According to these results, each additional beer leads to a more than 6-fold increase in the odds of having Bieber fever. Because the gender variable is coded such that males = 1 and females = 0, the odds of having Bieber fever is substantially higher for males.

It is also possible to get odds ratios with the .logit command by adding the or option:


. logit bieber beer gender, or

Iteration 0:   log likelihood = -69.234697  
Iteration 1:   log likelihood = -20.370116  
Iteration 2:   log likelihood = -19.384329  
Iteration 3:   log likelihood = -19.366223  
Iteration 4:   log likelihood = -19.366175  
Iteration 5:   log likelihood = -19.366175  

Logistic regression                               Number of obs   =        100
                                                  LR chi2(2)      =      99.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
      bieber | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        beer |   6.589404   2.881618     4.31   0.000     2.796447    15.52694
      gender |   29.36148   34.64055     2.86   0.004     2.907595    296.4981
       _cons |   8.00e-06   .0000221    -4.25   0.000     3.59e-08    .0017849
------------------------------------------------------------------------------

Likewise, it is possible to recover the untransformed coefficients by adding the coef option to the .logistic command.


. logistic bieber beer gender, coef

Logistic regression                               Number of obs   =        100
                                                  LR chi2(2)      =      99.74
                                                  Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
      bieber |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        beer |   1.885463   .4373108     4.31   0.000      1.02835    2.742576
      gender |   3.379684   1.179796     2.86   0.004     1.067326    5.692041
       _cons |  -11.73551   2.758776    -4.25   0.000    -17.14261   -6.328409
------------------------------------------------------------------------------

In either case, following up model estimation with the predict command yields predicted probabilities.

Still have questions? Contact us!