Mon - Fri : 08:00 AM - 5:00 PM

Structural Equation Models Using the SEM Package in R

Structural Equation Models Using the SEM Package in R


This example shows how to estimate a full structual equation model (SEM) using the R sem package. A description of how to install add-on packages in R, including sem, can be found here. The model to be estimated is described here. To load the sem package, use the library function.

> library(sem)

The data for this example can be read from a remote server directly from the command prompt. Because the data file is in PASW (SPSS) format, it is necessary to load the foreign library and use the read.spss function as follows.

> library(foreign)
> data<-read.spss("",

The variable names can be accessed using the names function.

> names(data)
[1] "reading"   "writing"   "math"      "analytic"  "simpsons"  "familyguy" "amerdad"

The SEM consists of two measurement models, one tapping intelligence and another tapping a sense of humor. Intelligence is assumed to be a latent variable which can be measured on the basis of test scores in four areas: reading, writing, math, and analysis. Humor is assumed to be a latent variable that can be measured by how much one enjoys the shows “The Simpsons,” “American Dad,” and “Family Guy.” The model also consists of a structural path connecting intelligence to humor.

Traditional SEMs are estimated using only the covariance matrix of the observations, which can be created easily with the cov function.

> dataCov<-cov(data)

This creates a new object, dataCov, which is the covariance matrix of the observed variables.

Models are specified in sem by describing the paths in the path model. One-headed arrows are assumed to indicate factor loadings or regression coefficients, and two-headed arrows are assumed to indicate variances and covariances. The paths are specified in the specify.model function. When the function is called without an argument, the user enters the paths interactively as follows:

> fullsem<-specify.model()
1: humor -> simpsons, NA, 1
2: humor -> familyguy, l2, NA
3: humor -> amerdad, l3, NA
4: intell -> reading, l4, NA
5: intell -> writing, l5, NA
6: intell -> math, l6, NA
7: intell -> analytic, l7, NA
8: intell -> humor, g1, NA
9: simpsons  simpsons, e1, NA
10: familyguy  familyguy, e2, NA
11: amerdad  amerdad, e3, NA
12: reading  reading, d1, NA
13: writing  writing, d2, NA
14: math  math, d3, NA
15: analytic  analytic, d4, NA
16: intell  intell, NA, 1
17: humor  humor, z1, NA
Read 17 records

The user hits enter twice after specifying the final path in order to exit the interactive session. There are actually three arguments that occur for each path. The first is the path itself, indicated with either a one-headed or two-headed arrow. The second gives the path a heuristic name, unless NA is specified. If NA is entered instead, the path is constrained to equal the number entered as the third argument. In this case, the path running from humor to simpsons is set equal to one in order to help identify the model. A second identifying constraint is that the variance of intell must equal one. Whenever a path is not constrained (and instead is given a name), the final argument specifies the starting value for the numeric optimizer. Entering NA as the final argument tells sem to pick its own starting value.

If the user does not want to have to specify all of the paths each time, it is possible to create a separate file that contains the same information but that can be easily altered if a slightly different model is of interest as well. For example, a file sempaths.txt was created that looks like the following:

> out<-sem(fullsem,covData,N=100)
> summary(out)

humor -> simpsons, NA, 1,
humor -> familyguy, l2, NA,
humor -> amerdad, l3, NA,
intell -> reading, l4, NA,
intell -> writing, l5, NA,
intell -> math, l6, NA,
intell -> analytic, l7, NA,
intell -> humor, g1, NA,
simpsons  simpsons, e1, NA,
familyguy  familyguy, e2, NA,
amerdad  amerdad, e3, NA,
reading  reading, d1, NA,
writing  writing, d2, NA,
math  math, d3, NA,
analytic  analytic, d4, NA,
intell  NA, 1,
humor  z1, NA

It is saved in the directory “C:semfiles” and can be called (remembering that R reverses back slashes to forward slashes for Windows pathnames) as follows:

> fullsem<-specify.model("C:/semfiles/sempaths.txt")

In either case, the model specification is saved as an object named fullsem. the sem function will estimate the model.

> fullsemOut<-sem(fullsem,dataCov,N=100)

The sem function takes three arguments: the name of the model object, the name of the covariance matrix, and the number of observations in the raw data file. To view the output, enter the name of the output object as the sole argument to the summary function. The output for this example looks like the following:

Model Chisquare =  13.678   Df =  13 Pr(>Chisq) = 0.39688
 Chisquare (null model) =  585.05   Df =  21
 Goodness-of-fit index =  0.96136
 Adjusted goodness-of-fit index =  0.91678
 RMSEA index =  0.022957   90% CI: (NA, 0.10366)
 Bentler-Bonnett NFI =  0.97662
 Tucker-Lewis NNFI =  0.99806
 Bentler CFI =  0.9988
 SRMR =  0.033751
 BIC =  -46.189 

 Normalized Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-7.65e-01 -1.59e-01  2.35e-06 -1.57e-02  7.98e-02  6.62e-01 

 Parameter Estimates
   Estimate Std Error z value Pr(>|z|)                           
l2 1.01509  0.090778  11.1821 0.0000e+00 familyguy  simpsons  
e2 0.26253  0.047008   5.5848 2.3399e-08 familyguy  familyguy
e3 0.15184  0.039199   3.8737 1.0719e-04 amerdad  amerdad    
d1 0.22827  0.044596   5.1187 3.0765e-07 reading  reading    
d2 0.22160  0.040121   5.5233 3.3263e-08 writing  writing    
d3 0.16196  0.035510   4.5611 5.0898e-06 math  math          
d4 0.20321  0.038094   5.3344 9.5859e-08 analytic  analytic  
z1 0.51431  0.091111   5.6449 1.6528e-08 humor  humor

The results mirror those produced by LISREL for the same model.

Still have questions? Contact us!