Mon - Fri : 08:00 AM - 5:00 PM
734.544-8038

# Estimating Logistic Regression Models in Stata

#### Note: For a fuller treatment, download our online seminar Maximum Likelihood Estimation for Categorical Dependent Variables.

Say you are interested in predicting whether somebody is a fan of Justin Bieber according to the amount of beer they have consumed as well as their gender. Bieber fever is coded 1 if the respondent is a fan and zero otherwise. Because the dependent variable is dichotomous, the appropriate method is logistic regression.

The logistic regression (or logit) model is linear in the log odds of the dependent variable.

Most people don’t think in terms of log odds, so it’s common to interpret the results either by exponentiating coefficients to yield odds ratios, or else by computing predicted probabilities. An odds ratio greater than one means that an increase in X leads to an increase in the odds that the dependent variable equals one; an odds ratio less than one means that the odds are decreasing. The predicted probabilities can be calculated using the formula for the cdf to the standard logistic distribution:

In Stata, there are two commands for fitting a logistic regression model. (Actually, there are more, but we won’t discuss the .glm and .ml commands here.) The two commands are .logit and .logistc. They estimate exactly the same model, but they report different output. The .logit command reports the untransformed beta coefficients. The .logistic command reports odds ratios, equal to e^β.

The syntax for the logit command is the following:

. logit bieber beer gender

This produces the following output:

```
Iteration 0:   log likelihood = -69.234697
Iteration 1:   log likelihood = -20.370116
Iteration 2:   log likelihood = -19.384329
Iteration 3:   log likelihood = -19.366223
Iteration 4:   log likelihood = -19.366175
Iteration 5:   log likelihood = -19.366175

Logistic regression                               Number of obs   =        100
LR chi2(2)      =      99.74
Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
bieber |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
beer |   1.885463   .4373108     4.31   0.000      1.02835    2.742576
gender |   3.379684   1.179796     2.86   0.004     1.067326    5.692041
_cons |  -11.73551   2.758776    -4.25   0.000    -17.14261   -6.328409
------------------------------------------------------------------------------

```

Because the coefficient is positive, each additional beer increases the log odds of having Bieber fever by 1.885. In addition, because gender is coded such that males = 1 and females = 0, the log odds of having Bieber fever is higher for males.

These coefficients are the untransformed betas from the linear model of the log odds. It is possible to return the predicted probability of, say, a male that has consumed 4 beers as follows:

That is, the probability that a male having consumed four beers has Bieber fever is .307.

It is possible to recover predicted probabilities for each person in the sample using the .predict command following model estimation. This command has several options, but the default is to calculate predicted probabilities.

. predict p

This creates a new variable, p, containing the predicted probabilities.

To get odds ratios, use the .logistic command.

```
. logistic bieber beer gender

Logistic regression                               Number of obs   =        100
LR chi2(2)      =      99.74
Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
bieber | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
beer |   6.589404   2.881618     4.31   0.000     2.796447    15.52694
gender |   29.36148   34.64055     2.86   0.004     2.907595    296.4981
_cons |   8.00e-06   .0000221    -4.25   0.000     3.59e-08    .0017849
------------------------------------------------------------------------------

```

According to these results, each additional beer leads to a more than 6-fold increase in the odds of having Bieber fever. Because the gender variable is coded such that males = 1 and females = 0, the odds of having Bieber fever is substantially higher for males.

It is also possible to get odds ratios with the .logit command by adding the or option:

```
. logit bieber beer gender, or

Iteration 0:   log likelihood = -69.234697
Iteration 1:   log likelihood = -20.370116
Iteration 2:   log likelihood = -19.384329
Iteration 3:   log likelihood = -19.366223
Iteration 4:   log likelihood = -19.366175
Iteration 5:   log likelihood = -19.366175

Logistic regression                               Number of obs   =        100
LR chi2(2)      =      99.74
Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
bieber | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
beer |   6.589404   2.881618     4.31   0.000     2.796447    15.52694
gender |   29.36148   34.64055     2.86   0.004     2.907595    296.4981
_cons |   8.00e-06   .0000221    -4.25   0.000     3.59e-08    .0017849
------------------------------------------------------------------------------

```

Likewise, it is possible to recover the untransformed coefficients by adding the coef option to the .logistic command.

```
. logistic bieber beer gender, coef

Logistic regression                               Number of obs   =        100
LR chi2(2)      =      99.74
Prob > chi2     =     0.0000
Log likelihood = -19.366175                       Pseudo R2       =     0.7203

------------------------------------------------------------------------------
bieber |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
beer |   1.885463   .4373108     4.31   0.000      1.02835    2.742576
gender |   3.379684   1.179796     2.86   0.004     1.067326    5.692041
_cons |  -11.73551   2.758776    -4.25   0.000    -17.14261   -6.328409
------------------------------------------------------------------------------

```

In either case, following up model estimation with the predict command yields predicted probabilities.