Mon - Fri : 08:00 AM - 5:00 PM

Estimating HLM Models Using Stata: Part 3

Estimating HLM Models Using Stata: Part 3


Note: For a fuller treatment, download our series of lectures Hierarchical Linear Models.

Random Coefficient Model

Next, R&B present a model in which student-level SES is included instead of average SES, and they treat the slope of student SES as random. One complication is that R&B present results after group-mean centering student SES. Group-mean centering means that the average SES for each student’s school is subtracted from each student’s individual SES. Unfortunately the meanses variable is coded -1, 0, 1 and is therefore only a rough indicator of each school’s average. To get a better estimate of the school average, one can take advantage of the .collapse command in Stata. The following illustrates one way of group mean centering (note: make sure the working directory is the one in which the data are stored).

. collapse (mean) meanses=ses, by(id)
. sort id
. save sesmeans, replace
. use hsb.dta, replace
. sort id
. merge m:1 id using sesmeans
. drop _merge
. gen centses=ses-meanses

The first line estimates the means of the ses variable by each cluster and replaces the current dataset with the results. The next line sorts the observations by the values on the cluster variable, and the third line saves the new data to the working directory as a file named sesmeans. The subsequent line reads the original data back in before sorting the observations by cluster. The .merge command tells Stata to merge the group means file to the original data. The m:1 tells Stata that the merge is many-to-one (there are many observations in the first file that will be matched to a single observation in the second). The penultimate line drops a variable Stata produces for diagnosing the quality of merges, and the final line subtracts the group means from each ses observation to produce the group-mean centered variable.

The level-1 equation is the following:


The intercept β0j can be modeled as a grand mean γ00 plus random error, u0j. Similarly, the slope β1j can be modelled as having a grand mean γ10 plus random error u1j.


Combining (7) and (8) into (6) produces:


The syntax to estimate the model is the following:

xtmixed mathach centses || id: centses  , var cov(un)

The independent variable again follows the dependent variable in the fixed effects portion of the model. Including the same independent variable following the cluster variable in the random effects portion tells Stata to estimate a random slope. There are now two random effects in the model. By default, Stata will assume that the two random effects do not covary (see the table of covariance structures in A Review of Random Effects ANOVA Models). The cov(un) option requests an unstructured covariance matrix for the random effects, meaning that all parameters — variances and covariances — are estimated.

The output is the following:

. xtmixed mathach centses || id: centses  , var cov(un)

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -23357.179  
Iteration 1:   log restricted-likelihood = -23357.117  
Iteration 2:   log restricted-likelihood = -23357.117  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =      7185
Group variable: id                              Number of groups   =       160

                                                Obs per group: min =        14
                                                               avg =      44.9
                                                               max =        67

                                                Wald chi2(1)       =    292.40
Log restricted-likelihood = -23357.117          Prob > chi2        =    0.0000

     mathach |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
     centses |   2.193196   .1282584    17.10   0.000     1.941814    2.444578
       _cons |   12.63619   .2445044    51.68   0.000     12.15697    13.11541

  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
id: Unstructured             |
                var(centses) |   .6939782   .2807837      .3140144    1.533706
                  var(_cons) |    8.68102   1.079543      6.803277    11.07703
          cov(centses,_cons) |   .0467923   .4063748     -.7496876    .8432723
               var(Residual) |    36.7002   .6257441      35.49403    37.94736
LR test vs. linear regression:       chi2(3) =  1065.68   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

These results correspond to Table 4.4 in R&B. See also the variance-covariance components at the bottom of their page 77.

The final model R&B present is an intercept- and slopes-as-outomes model.

Still have questions? Contact us!