This example shows how to estimate a full structual equation model (SEM) using the R sem package. A description of how to install add-on packages in R, including sem, can be found here. The model to be estimated is described here. To load the sem package, use the library function.
The data for this example can be read from a remote server directly from the command prompt. Because the data file is in PASW (SPSS) format, it is necessary to load the foreign library and use the read.spss function as follows.
> library(foreign) > data<-read.spss("http://www.methodsconsultants.com/data/intelligence.sav", to.data.frame=TRUE)
The variable names can be accessed using the names function.
> names(data)  "reading" "writing" "math" "analytic" "simpsons" "familyguy" "amerdad"
The SEM consists of two measurement models, one tapping intelligence and another tapping a sense of humor. Intelligence is assumed to be a latent variable which can be measured on the basis of test scores in four areas: reading, writing, math, and analysis. Humor is assumed to be a latent variable that can be measured by how much one enjoys the shows “The Simpsons,” “American Dad,” and “Family Guy.” The model also consists of a structural path connecting intelligence to humor.
Traditional SEMs are estimated using only the covariance matrix of the observations, which can be created easily with the cov function.
This creates a new object, dataCov, which is the covariance matrix of the observed variables.
Models are specified in sem by describing the paths in the path model. One-headed arrows are assumed to indicate factor loadings or regression coefficients, and two-headed arrows are assumed to indicate variances and covariances. The paths are specified in the specify.model function. When the function is called without an argument, the user enters the paths interactively as follows:
> fullsem<-specify.model() 1: humor -> simpsons, NA, 1 2: humor -> familyguy, l2, NA 3: humor -> amerdad, l3, NA 4: intell -> reading, l4, NA 5: intell -> writing, l5, NA 6: intell -> math, l6, NA 7: intell -> analytic, l7, NA 8: intell -> humor, g1, NA 9: simpsons simpsons, e1, NA 10: familyguy familyguy, e2, NA 11: amerdad amerdad, e3, NA 12: reading reading, d1, NA 13: writing writing, d2, NA 14: math math, d3, NA 15: analytic analytic, d4, NA 16: intell intell, NA, 1 17: humor humor, z1, NA 18: Read 17 records
The user hits enter twice after specifying the final path in order to exit the interactive session. There are actually three arguments that occur for each path. The first is the path itself, indicated with either a one-headed or two-headed arrow. The second gives the path a heuristic name, unless NA is specified. If NA is entered instead, the path is constrained to equal the number entered as the third argument. In this case, the path running from humor to simpsons is set equal to one in order to help identify the model. A second identifying constraint is that the variance of intell must equal one. Whenever a path is not constrained (and instead is given a name), the final argument specifies the starting value for the numeric optimizer. Entering NA as the final argument tells sem to pick its own starting value.
If the user does not want to have to specify all of the paths each time, it is possible to create a separate file that contains the same information but that can be easily altered if a slightly different model is of interest as well. For example, a file sempaths.txt was created that looks like the following:
> out<-sem(fullsem,covData,N=100) > summary(out) humor -> simpsons, NA, 1, humor -> familyguy, l2, NA, humor -> amerdad, l3, NA, intell -> reading, l4, NA, intell -> writing, l5, NA, intell -> math, l6, NA, intell -> analytic, l7, NA, intell -> humor, g1, NA, simpsons simpsons, e1, NA, familyguy familyguy, e2, NA, amerdad amerdad, e3, NA, reading reading, d1, NA, writing writing, d2, NA, math math, d3, NA, analytic analytic, d4, NA, intell NA, 1, humor z1, NA
It is saved in the directory “C:semfiles” and can be called (remembering that R reverses back slashes to forward slashes for Windows pathnames) as follows:
In either case, the model specification is saved as an object named fullsem. the sem function will estimate the model.
The sem function takes three arguments: the name of the model object, the name of the covariance matrix, and the number of observations in the raw data file. To view the output, enter the name of the output object as the sole argument to the summary function. The output for this example looks like the following:
Model Chisquare = 13.678 Df = 13 Pr(>Chisq) = 0.39688 Chisquare (null model) = 585.05 Df = 21 Goodness-of-fit index = 0.96136 Adjusted goodness-of-fit index = 0.91678 RMSEA index = 0.022957 90% CI: (NA, 0.10366) Bentler-Bonnett NFI = 0.97662 Tucker-Lewis NNFI = 0.99806 Bentler CFI = 0.9988 SRMR = 0.033751 BIC = -46.189 Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -7.65e-01 -1.59e-01 2.35e-06 -1.57e-02 7.98e-02 6.62e-01 Parameter Estimates Estimate Std Error z value Pr(>|z|) l2 1.01509 0.090778 11.1821 0.0000e+00 familyguy simpsons e2 0.26253 0.047008 5.5848 2.3399e-08 familyguy familyguy e3 0.15184 0.039199 3.8737 1.0719e-04 amerdad amerdad d1 0.22827 0.044596 5.1187 3.0765e-07 reading reading d2 0.22160 0.040121 5.5233 3.3263e-08 writing writing d3 0.16196 0.035510 4.5611 5.0898e-06 math math d4 0.20321 0.038094 5.3344 9.5859e-08 analytic analytic z1 0.51431 0.091111 5.6449 1.6528e-08 humor humor
The results mirror those produced by LISREL for the same model.
Still have questions? Contact us!