Note: For a fuller treatment, download our online seminar Maximum Likelihood Estimation for Categorical Dependent Variables.
Say you are interested in predicting whether somebody is a fan of Justin Bieber according to the amount of beer they have consumed as well as their gender. Bieber fever is coded 1 if the respondent is a fan and zero otherwise. Because the dependent variable is dichotomous, the appropriate method is logistic regression.
The logistic regression (or logit) model is linear in the log odds of the dependent variable.
Most people don’t think in terms of log odds, so it’s common to interpret the results either by exponentiating coefficients to yield odds ratios, or else by computing predicted probabilities. An odds ratio greater than one means that an increase in X leads to an increase in the odds that the dependent variable equals one; an odds ratio less than one means that the odds are decreasing. The predicted probabilities can be calculated using the formula for the cdf to the standard logistic distribution:
In SPSS, logistic regression can be estimated by going to Analyze → Regression → Binary Logistic
This opens up the logistic regression dialog box. To begin, place the variable Bieber Fever in the Dependent field. Enter the two independent variables, Beers Consumed and Gender, in the Covariates field.
The logistic regression command in SPSS makes it easy to work with categorical variables having any number of categories. In this case, gender is a nominal variable that takes the number zero for females and one for males. To tell SPSS that this is a categorical variable, click on the Categorical button. This brings up the Categorical dialog box.
Bring the gender variable over to the Categorical Covariates box. The Contrast field defaults to Indicator, which is equivalent to adding dummy variables (without actually creating them by hand). The default is to use the last category as the reference. In this case, the result would be the same as adding a dummy for females. Here we want females to be the reference category, so change this option to First. Be sure to also click Change, or the option won’t be saved. Finally, click Continue.
The last step we want is to ask SPSS to produce predicted probabilities for each of our observations. These will tell us the probability that a specific person in the sample has Bieber fever. To obtain sample predicted probabilities, click on Options. This brings up the Options dialog box.
Check the box for Probabilities, then Continue. Click OK to estimate the model.
The results for the model are the following:
The B column displays the untransformed coefficients. Because the first coefficient is positive, each additional beer increases the log odds of having Bieber fever by 1.885. In addition, because females are our reference category, the log odds of having Bieber fever is higher for males. Each estimate is statistically significant.
Because we requested that SPSS calculate predicted probabilities for each member of our sample, there is now a new variable available that lists the probability each respondent has Bieber fever. The name of this new variable is PRED_1.
It is also possible to return the predicted probability of a hypothetical, out-of-sample respondent using the formula given above. For example, the predicted probability that a male consuming 4 beers has Bieber fever is calculated as follows:
That is, the probability that a male having consumed four beers has Bieber fever is .307.
Going back to the results table, the final column Exp(B) lists odds ratios. According to these results, each additional beer leads to a more than 6-fold increase in the odds of having Bieber fever. Because we have chosen females to be our reference category, the odds of having Bieber fever is substantially higher for males.
Still have questions? Contact us!