Mon - Fri : 08:00 AM - 5:00 PM

Confirmatory Factor Analysis Using the SEM Package in R

Confirmatory Factor Analysis Using the SEM Package in R


This example shows how to estimate a confirmatory factor model and, on the next page, a full structual equation model (SEM) using the R sem package. The primary benefit of sem is that it is entirely free, requiring only an R installation. A description of how to install add-on packages in R, including sem, can be found here. The model to be estimated is described here.

The data for this example can be read from a remote server directly from the command prompt. Because the data file is in PASW (SPSS) format, it is necessary to load the foreign library and use the read.spss function as follows.

> library(foreign)
> data<-read.spss("",

The variable names can be accessed using the names function.

> names(data)
[1] "reading"   "writing"   "math"      "analytic"  "simpsons"  "familyguy" "amerdad"

The confirmatory factor model treats intelligence as a latent variable which can be measured on the basis of test scores in four areas: reading, writing, math, and analysis. Thus, only the first four variables in the file are needed. The data can be subset as follows:

> data<-data[ ,1:4]

The brackets are used to select rows and columns from a matrix. A space is left before the comma to indicate that all rows (that is, all observations) should be retained. The 1:4 indicates that only columns (that is, variables) one through four will be retained. Traditional confirmatory factor models are estimated using only the covariance matrix of the observations. Once the data have been subset, the covariance matrix can be created easily with the cov function.

> dataCov<-cov(data)

This creates a new object, dataCov, which is the covariance matrix of the subsetted data file.

Models are specified in sem by describing the paths in the path model. One-headed arrows are assumed to indicate factor loadings or regression coefficients, and two-headed arrows are assumed to indicate variances and covariances. The paths are specified in the specify.model function. When the function is called without an argument, the user enters the paths interactively as follows:

> cfa<-specify.model()
1: intell -> reading,	NA,	1,
2: intell -> writing, 	l2,	NA
3: intell -> math,	l3,	NA
4: intell -> analytic,	l4,	NA
5: reading  reading,	d1,	NA
6: writing  writing,	d2,	NA
7: math  math,	d3,	NA
8: analyticanalytic, d4,	NA
9: intell  intell,	p1,	NA
Read 9 records

The user hits enter twice after specifying the final path in order to exit the interactive session. There are actually three arguments that occur for each path. The first is the path itself, indicated with either a one-headed or two-headed arrow. The second gives the path a heuristic name, unless NA is specified. If NA is entered instead, the path is constrained to equal the number entered as the third argument. In this case, the path from intell to reading is constrained to equal one. Whenever a path is not constrained (and instead is given a name), the final argument specifies that starting value for the numeric optimizer. Entering NA as the final argument tells sem to pick its own value.

If the user does not want to have to specify all of the paths each time, it is possible to create a separate file that contains the same information but that can be easily altered if a slightly different model is of interest as well. For example, a file paths.txt was created that looks like the following:

intell -> reading,	NA,	1,
intell -> writing, 	l2,	NA,
intell -> math,		l3,	NA,
intell -> analytic,	l4,	NA,
reading  reading,	d1,	NA,
writing  writing,	d2,	NA,
math  math,		d3,	NA,
analytic  analytic,	d4,	NA,
intell  intell,	p1,	NA

It is saved in the directory “C:semfiles” and can be called (remembering that R reverses back slashes to forward slashes for Windows paths) as follows:

> cfa<-specify.model("C:/semfiles/paths.txt")

In either case, the model specification is saved as an object named cfa. The sem function estimates the model.

> cfaOut<-sem(cfa,dataCov,N=100)

The sem function takes three arguments: the name of the model object, the name of the covariance matrix, and the number of observations in the raw data file. To view the output, enter the name of the output object as the sole argument to the summary function. The output for this example looks like the following:

> summary(cfaOut)

 Model Chisquare =  3.4973   Df =  2 Pr(>Chisq) = 0.17401
 Chisquare (null model) =  355.28   Df =  6
 Goodness-of-fit index =  0.98345
 Adjusted goodness-of-fit index =  0.91724
 RMSEA index =  0.08696   90% CI: (NA, 0.23526)
 Bentler-Bonnett NFI =  0.99016
 Tucker-Lewis NNFI =  0.98714
 Bentler CFI =  0.99571
 SRMR =  0.012067
 BIC =  -5.713 

 Normalized Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.195000 -0.030700  0.012200 -0.000693  0.048300  0.148000 

 Parameter Estimates
   Estimate Std Error z value Pr(>|z|)                         
l2 0.89458  0.070176  12.7476 0.0000e+00 writing  reading  
d2 0.21658  0.039659   5.4610 4.7334e-08 writing  writing  
d3 0.16877  0.036123   4.6721 2.9808e-06 math  math        
d4 0.20472  0.038473   5.3210 1.0322e-07 analytic  analytic
p1 0.89354  0.158570   5.6350 1.7508e-08 intell  intell    

 Iterations =  24

An example of a full structural equation model using the sem package is available on the next page.

Still have questions? Contact us!