Note: For a fuller treatment, download our series of lectures Hierarchical Linear Models.
The one-way ANOVA for a completely randomized design was the following:
Another way of thinking of an experiment with a single J-level factor would be as a regression model with J-1 dummy variables. An alternative means of writing the same model for a three-level factor would be:
which is simply a regression equation with dummies for two of the treatment levels and the third level (the control group) used as the reference category. Indeed, any fixed effects ANOVA can be rewritten as a regression model regardless of the number of factors, treatment levels, and interactions. For example, the 2×2 (two treatments each with two levels) fully factorial design could be written as
where Factor A and Factor B are dummies coded one if the subject received one level of the treatment and zero otherwise. The equalities between the ANOVA model and regression with dummy variables have long been known to statisticians. Even though the language and notation of ANOVA and regression are often quite distinct, both express precisely the same model. Indeed, once one begins to talk about analysis of covariance (ANCOVA) — an experimental design that includes a control for a continuous variable — differences between experimental data analysis and regression methods for observational data essentially disappear. Consequently statisticians talk about the general linear model, or GLM (not to be confused with the generalized linear model of econometrics), of which regression and ANOVA are special cases. In matrix notation, the GLM can be expressed simply as
In experimental settings, X is referred to as the design matrix, because it reflects the treatment levels to which subjects have been assigned. Note that the GLM is limited to models that contain only fixed effects. When random effects are present, the model must be expanded.
The Linear Mixed Model
Just like the analysis of variance model with fixed effects can be written in matrix notation as a GLM, models with random effects have their own matrix specification:
In the linear mixed model (LMM), X is once again the design matrix corresponding to the fixed effects. However, the LMM adds an additional design matrix Z for the random effects. The elements of u are random effects coefficients. These differ from the coefficients in β in that they are treated as random variables, meaning that they are not specific point estimates but rather are summarized according to a probability distribution (usually multivariate normal with mean zero and covariance matrix G).
In each of the experimental examples, the righthand side variables were always factors. In the more general cases of the GLM and LMM, there is no reason why the variables in X or Z cannot be continuous. When there are no variables in Z, such that the random effects design matrix contains only a column of ones, the result is a random intercept model. When Z does contain variables, the result is a random slopes model.
These observations lead to the key connections for understanding how the examples of experiments with random effects tie into multilevel modeling. Fixed-effects ANOVA corresponds to the situation where a dependent variable is continuous and all predictor variables are categorical. This is just a special case of the GLM, however, which also encompasses models containing continuous covariates. Mixed effects ANOVA corresponds to situations where the dependent variable is continuous and the predictor variables are a mixture of fixed and random factors. This is simply a special case of the linear mixed model, which also encompasses models containing continuous fixed and random effects covariates. Multilevel models contain a mixture of fixed and random factors plus covariates and are hence a type of linear mixed model. Just like regression is a generalization of fixed effects ANOVA, multilevel modeling is a generalization of mixed effects ANOVA.
Example from Raudenbush and Bryk, 2002: Chapter 4
This section shows how one can go from the regression model-building notation found in many applications of HLM (for example, Raudenbush and Bryk, 2002) to the more general matrix notation of the LMM. Once one is comfortable moving back and forth between the two notations, using mixed model procedures in general-purpose software should be straightforward.
Raudenbush and Bryk (2002: Chapter 4) develop a model for student test performance that includes both student-level and school-level variables. They begin by specifying a model at the individual level that expresses test performance as a function of a student’s socioeconomic status (SES).
The test score for the ith student in the jth school depends on a school-specific intercept β0j and a school-specific effect for the student’s SES score, β1j. They then generalize the model so that the intercept for each school depends on the average student SES level for that school and whether the school is public or private. They do the same for the slope parameter.
It is then possible to substitute the models for the intercept and slope into the student-level model to produce the full multilevel model:
This approach of building the full model through substition of models for the intercept and slopes mirrors the means by which one sets up estimation in the software package HLM. It is not very revealing, however, if one wishes to make use of mixed models procedures in other packages. Instead, it is helpful to isolate the random effects from the fixed effects. Recall the LMM
The effects in the full multilevel model can be isolated to match the LMM in matrix notation as follows:
Note that it is typical for a variable to appear both as a fixed effect in X and a random effect in Z. In this example, the fixed effect of SES would correspond to the overall expected effect of a student’s SES level on test performance; the random effect would give information as to whether the size of the effect varies across students.
The next section demonstrates how to estimate this HLM model using SPSS.
Still have questions? Contact us!