The most common ways to read data into R are to use functions that read raw text files, such as read.table or read.csv, or to use functions from the foreign library that recognize files created by SPSS or Stata.
Reading Raw Text Files Into R
Data stored as raw text come in several different forms. A tab delimited file contains variables in columns separated by a tab. For example,
day calories Sunday 2050 Monday 1800 Tuesday 1925 Wednesday 2015 Thursday 1875 Friday 3000 Saturday 2800
The read.tablefunction will read a tab-delimited file. For example,
> food<-read.table("http://www.methodsconsultants.com/data/food.dat", header=TRUE, sep="t")
The first argument of the function is the location of the file. The header=TRUE argument specifies that the first row contains variable names, and sep="t" specifies that the data are separated by tabs. If sep="" is used instead, then R understands any white space to separate columns. Specifying sep="," can be used for comma-separated files. Alternatively, the functions read.delim and read.csv can be used. They differ from read.table only in the defaults the specific arguments take (for example, read.csv defaults to a comma-separated file).
In this example, the data file is stored on a Web server and is accessed by including the full URL. If the file is local to a user’s computer, the pathname is used instead. Note that R understands forward slashes when specifying pathnames, whereas Windows usually takes back slashes.
If the data are stored locally, relative pathnames will work as well. The function getwd() called with no arguments displays the current working directory, while the setwd function changes the working directory. By setting the working directory, it is possible to minimize the amount of typing that is necessary when reading and writing data from a single folder.
Reading Data Saved as SPSS or Stata files
The foreign package can be used to read in data saved as SPSS or Stata system files. To use these functions, it is first necessary to load the foreign library.
The read.sav function reads data saved in .sav files for SPSS. For example:
> food<-read.spss("http://www.methodsconsultants.com/data/food.sav", + to.data.frame=TRUE)
The first argument specifies the location of the file. If the file is local to the user’s machine, the same rules apply for specifying pathnames as was the case for text files. Forward slashes should be used for absolute pathnames, even when running R in Windows. The user may also wish to first set the working directory using the setwd function.
An additional option is to use the file.choose() function – called with no arguments – instead of a pathname. This will open up a dialog box that can be used to navigate to the folder where the desired file resides. For example,
> food<-read.spss(file.choose(), to.data.frame=FALSE)
The to.data.frame is an optional argument to immediately convert the data into a data frame, which is what R understands to be a data set. Its default is FALSE. There is an additional optional argument, use.value.labels, which converts values labels into the actual values for a factor (i.e. a categorical variable). Its default is TRUE. This should be changed to FALSE if the user wants to retain the original numeric codings.
Reading data saved as a Stata .dta file is similar. The function is read.dta. For example,
or, to navigate to a file stored locally,
Unlike SPSS, it is not necessary to add an option telling R to save the file as a data frame; this is done automatically. Like SPSS, however, it is necessary to change the default if the user does not want value labels to be converted into the actual values for factors. This is done by adding convert.factors=FALSE.
Consult the help files for each of these functions for more information.
Still have questions? Contact us!