# Confirmatory Factor Analysis Using the SEM Package in R

This example shows how to estimate a confirmatory factor model and, on the next page, a full structual equation model (SEM) using the R `sem` package. The primary benefit of `sem` is that it is entirely *free*, requiring only an R installation. A description of how to install add-on packages in R, including `sem`, can be found here. The model to be estimated is described here.

The data for this example can be read from a remote server directly from the command prompt. Because the data file is in PASW (SPSS) format, it is necessary to load the `foreign` library and use the `read.spss` function as follows.

> library(foreign) > data<-read.spss("http://www.methodsconsultants.com/data/intelligence.sav", + to.data.frame=TRUE)

The variable names can be accessed using the `names` function.

> names(data) [1] "reading" "writing" "math" "analytic" "simpsons" "familyguy" "amerdad"

The confirmatory factor model treats intelligence as a latent variable which can be measured on the basis of test scores in four areas: reading, writing, math, and analysis. Thus, only the first four variables in the file are needed. The data can be subset as follows:

> data<-data[ ,1:4]

The brackets are used to select rows and columns from a matrix. A space is left before the comma to indicate that all rows (that is, all observations) should be retained. The `1:4` indicates that only columns (that is, variables) one through four will be retained. Traditional confirmatory factor models are estimated using only the covariance matrix of the observations. Once the data have been subset, the covariance matrix can be created easily with the `cov` function.

> dataCov<-cov(data)

This creates a new object, `dataCov`, which is the covariance matrix of the subsetted data file.

Models are specified in `sem` by describing the paths in the path model. One-headed arrows are assumed to indicate factor loadings or regression coefficients, and two-headed arrows are assumed to indicate variances and covariances. The paths are specified in the `specify.model` function. When the function is called without an argument, the user enters the paths interactively as follows:

> cfa<-specify.model() 1: intell -> reading, NA, 1, 2: intell -> writing, l2, NA 3: intell -> math, l3, NA 4: intell -> analytic, l4, NA 5: reading reading, d1, NA 6: writing writing, d2, NA 7: math math, d3, NA 8: analyticanalytic, d4, NA 9: intell intell, p1, NA 10: Read 9 records

The user hits enter twice after specifying the final path in order to exit the interactive session. There are actually three arguments that occur for each path. The first is the path itself, indicated with either a one-headed or two-headed arrow. The second gives the path a heuristic name, unless `NA` is specified. If `NA` is entered instead, the path is constrained to equal the number entered as the third argument. In this case, the path from `intell` to `reading` is constrained to equal one. Whenever a path is not constrained (and instead is given a name), the final argument specifies that starting value for the numeric optimizer. Entering `NA` as the final argument tells `sem` to pick its own value.

If the user does not want to have to specify all of the paths each time, it is possible to create a separate file that contains the same information but that can be easily altered if a slightly different model is of interest as well. For example, a file `paths.txt` was created that looks like the following:

intell -> reading, NA, 1, intell -> writing, l2, NA, intell -> math, l3, NA, intell -> analytic, l4, NA, reading reading, d1, NA, writing writing, d2, NA, math math, d3, NA, analytic analytic, d4, NA, intell intell, p1, NA

It is saved in the directory “C:semfiles” and can be called (remembering that R reverses back slashes to forward slashes for Windows paths) as follows:

> cfa<-specify.model("C:/semfiles/paths.txt")

In either case, the model specification is saved as an object named `cfa`. The `sem` function estimates the model.

> cfaOut<-sem(cfa,dataCov,N=100)

The `sem` function takes three arguments: the name of the model object, the name of the covariance matrix, and the number of observations in the raw data file. To view the output, enter the name of the output object as the sole argument to the `summary` function. The output for this example looks like the following:

> summary(cfaOut) Model Chisquare = 3.4973 Df = 2 Pr(>Chisq) = 0.17401 Chisquare (null model) = 355.28 Df = 6 Goodness-of-fit index = 0.98345 Adjusted goodness-of-fit index = 0.91724 RMSEA index = 0.08696 90% CI: (NA, 0.23526) Bentler-Bonnett NFI = 0.99016 Tucker-Lewis NNFI = 0.98714 Bentler CFI = 0.99571 SRMR = 0.012067 BIC = -5.713 Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -0.195000 -0.030700 0.012200 -0.000693 0.048300 0.148000 Parameter Estimates Estimate Std Error z value Pr(>|z|) l2 0.89458 0.070176 12.7476 0.0000e+00 writing reading d2 0.21658 0.039659 5.4610 4.7334e-08 writing writing d3 0.16877 0.036123 4.6721 2.9808e-06 math math d4 0.20472 0.038473 5.3210 1.0322e-07 analytic analytic p1 0.89354 0.158570 5.6350 1.7508e-08 intell intell Iterations = 24

An example of a full structural equation model using the sem package is available on the next page.

Still have questions? Contact us!