# Estimating Logistic Regression Models in Stata

#### Note: For a fuller treatment, download our online seminar Maximum Likelihood Estimation for Categorical Dependent Variables.

Say you are interested in predicting whether somebody is a fan of Justin Bieber according to the amount of beer they have consumed as well as their gender. Bieber fever is coded 1 if the respondent is a fan and zero otherwise. Because the dependent variable is dichotomous, the appropriate method is logistic regression.

The logistic regression (or logit) model is linear in the log odds of the dependent variable.

Most people don’t think in terms of log odds, so it’s common to interpret the results either by exponentiating coefficients to yield odds ratios, or else by computing predicted probabilities. An odds ratio greater than one means that an increase in X leads to an increase in the odds that the dependent variable equals one; an odds ratio less than one means that the odds are decreasing. The predicted probabilities can be calculated using the formula for the cdf to the standard logistic distribution:

In Stata, there are two commands for fitting a logistic regression model. (Actually, there are more, but we won’t discuss the `.glm` and `.ml` commands here.) The two commands are `.logit` and `.logistc`. They estimate exactly the same model, but they report different output. The `.logit` command reports the untransformed beta coefficients. The `.logistic` command reports odds ratios, equal to e^β.

The syntax for the logit command is the following:

`. logit bieber beer gender `

This produces the following output:

Iteration 0: log likelihood = -69.234697 Iteration 1: log likelihood = -20.370116 Iteration 2: log likelihood = -19.384329 Iteration 3: log likelihood = -19.366223 Iteration 4: log likelihood = -19.366175 Iteration 5: log likelihood = -19.366175 Logistic regression Number of obs = 100 LR chi2(2) = 99.74 Prob > chi2 = 0.0000 Log likelihood = -19.366175 Pseudo R2 = 0.7203 ------------------------------------------------------------------------------ bieber | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- beer | 1.885463 .4373108 4.31 0.000 1.02835 2.742576 gender | 3.379684 1.179796 2.86 0.004 1.067326 5.692041 _cons | -11.73551 2.758776 -4.25 0.000 -17.14261 -6.328409 ------------------------------------------------------------------------------

Because the coefficient is positive, each additional beer increases the log odds of having Bieber fever by 1.885. In addition, because gender is coded such that males = 1 and females = 0, the log odds of having Bieber fever is higher for males.

These coefficients are the untransformed betas from the linear model of the log odds. It is possible to return the predicted probability of, say, a male that has consumed 4 beers as follows:

That is, the probability that a male having consumed four beers has Bieber fever is .307.

It is possible to recover predicted probabilities for each person in the sample using the `.predict` command following model estimation. This command has several options, but the default is to calculate predicted probabilities.

` . predict p `

This creates a new variable, p, containing the predicted probabilities.

To get odds ratios, use the `.logistic` command.

. logistic bieber beer gender Logistic regression Number of obs = 100 LR chi2(2) = 99.74 Prob > chi2 = 0.0000 Log likelihood = -19.366175 Pseudo R2 = 0.7203 ------------------------------------------------------------------------------ bieber | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- beer | 6.589404 2.881618 4.31 0.000 2.796447 15.52694 gender | 29.36148 34.64055 2.86 0.004 2.907595 296.4981 _cons | 8.00e-06 .0000221 -4.25 0.000 3.59e-08 .0017849 ------------------------------------------------------------------------------

According to these results, each additional beer leads to a more than 6-fold increase in the odds of having Bieber fever. Because the gender variable is coded such that males = 1 and females = 0, the odds of having Bieber fever is substantially higher for males.

It is also possible to get odds ratios with the .logit command by adding the or option:

. logit bieber beer gender, or Iteration 0: log likelihood = -69.234697 Iteration 1: log likelihood = -20.370116 Iteration 2: log likelihood = -19.384329 Iteration 3: log likelihood = -19.366223 Iteration 4: log likelihood = -19.366175 Iteration 5: log likelihood = -19.366175 Logistic regression Number of obs = 100 LR chi2(2) = 99.74 Prob > chi2 = 0.0000 Log likelihood = -19.366175 Pseudo R2 = 0.7203 ------------------------------------------------------------------------------ bieber | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- beer | 6.589404 2.881618 4.31 0.000 2.796447 15.52694 gender | 29.36148 34.64055 2.86 0.004 2.907595 296.4981 _cons | 8.00e-06 .0000221 -4.25 0.000 3.59e-08 .0017849 ------------------------------------------------------------------------------

Likewise, it is possible to recover the untransformed coefficients by adding the `coef` option to the `.logistic` command.

. logistic bieber beer gender, coef Logistic regression Number of obs = 100 LR chi2(2) = 99.74 Prob > chi2 = 0.0000 Log likelihood = -19.366175 Pseudo R2 = 0.7203 ------------------------------------------------------------------------------ bieber | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- beer | 1.885463 .4373108 4.31 0.000 1.02835 2.742576 gender | 3.379684 1.179796 2.86 0.004 1.067326 5.692041 _cons | -11.73551 2.758776 -4.25 0.000 -17.14261 -6.328409 ------------------------------------------------------------------------------

In either case, following up model estimation with the `predict` command yields predicted probabilities.

Still have questions? Contact us!