Back to overview

When you’re just starting to use R, it might feel more efficient to preprocess the data familiar programs such as MS Excel: fixing typos, adding some new information, calculating means.

However, I would like to encourage you to skip this step and immediately load the data that is produced by your experimental software in R. Some arguments:

• Typically these programs load the real data, which means that changes are saved in the data file itself. In addition, these programs typically don’t provide an history log, so it’s difficult to trace back changes later. In contrast, R loads a copy of the data in it’s working memory and does not make any changes in the data. It allows for working with scripts, which record all the manipulations performed on the data.

• Manually working on data files will result in inconsistent errors and typos, which are hard to detect and annoying to fix.

• In R it is easier to detect systematic problems with the data, such as confounds and incorrect labeling.

### Different formats

Different data formats could be loaded in R. Here are some examples for loading frequently used data types in R:

.Rdata or .rda files contain one or more R objects with names.

# load some (not existing) file in R:
# use ls() to see which objects are loaded in your workspace:
ls()

.rds files contain one R object without the name. Therefore, when read in R the object must be stored into a variable.

# load some (not existing) file in R:
# use str() to see what object is loaded:
str(newdat)

text files

# for delimited text with header:
# see help file for other options to set:
help(read.table)

top

### Common problems in reading text files

Often reading text files does not work the first time. Don’t stress out, but systematically try to find out what went wrong.

R cannot find the file you are trying to read. Check what is your current directory (i.e., the folder that R takes as a starting point for searching for the file) with the command getwd(). To change the working directory you could use the function setwd(). Sometimes it also helps to provide the complete path to the file that you are trying to load, e.g. “C://Documents/data.txt” on Windows or “/Users/Jacolien/Documents/data.txt” on Mac.

# This will result in an error, because the file does not exist:
dat <- read.table("file.txt")

Incorrect delimiter. One thought that the text file was delimited by tabs or spaces, but instead commas were used. This is often the case by csv files, text files exported by MS Excel. No error is given, but the output looks strange.

Download the file data-EN.csv and store it in your working directory. Inspect the file with a text editor before running the code below.

dat <- read.table("data-EN.csv", header=TRUE)
# Output does not look right:
str(dat)

Change the delimiter in the argument sep will fix this issue:

dat <- read.table("data-EN.csv", header=TRUE, sep=',')
# output looks ok:
str(dat)

Decimal values. The file uses a comma to indicate the decimal values, e.g. 3.5 is indicated as 3,5. Changing the argument dec to dec=',' might help.

Download the files data-DE1.txt and data-DE2.txt and store it in your working directory. Inspect the files with a text editor before running the code below.

# incorrect, because RT is a factor:
str(dat)

# fixed with dec:
str(dat)

However, when the file uses additionally a ‘.’ as 1000 separator, the use of dec won’t help.

# incorrect, because RT is a factor:
str(dat)

Here’s two examples of how this could be fixed (see XX for the use of gsub). Better is to make sure when exporting the data that the decimal separator is a ‘.’ and that there is no thousand separator.

# more complex fix:
dat$RT <- as.character(dat$RT) # convert factors to strings
dat$RT <- gsub("\\.", "", dat$RT) # remove first thousand separator
dat$RT <- gsub("\\,", "\\.", dat$RT) # replace comma with dots
dat$RT <- as.numeric(dat$RT) # convert to numeric
str(dat)

# Or in short:
dat$RT <- as.numeric(gsub("\\,", "\\.", gsub("\\.", "", dat$RT)))
str(dat)

Missing values are indicated by a “.” instead of empty. Changing the argument na.strings might help.

Download the file data-missing.txt and store it in your working directory. Inspect the file with a text editor before running the code below.

dat <- read.table("data-missing.txt", header=TRUE)
# incorrectly labeled RT as factor:
str(dat)
# fix by setting na.strings to a .:
# The RT is now ok, but the NA in the subject column
# is now incorrectly considered as a new subject level:
str(dat)
# fix by setting na.strings to a . and NA:
# now it works:
str(dat)

Missing column names

Download the file data-info1.txt and store it in your working directory. Inspect the file with a text editor before running the code below.

dat <- read.table("data-info1.txt", header=TRUE)

R considers the first column as row names, because one column name is missing. row.names=NULL forces row numbering.

dat <- read.table("data-info1.txt", header=TRUE, row.names=NULL)
head(dat)

Missing data

In the file data-info2.txt the missing data is completely removed, so that R sees only two columns instead of three for subject s2. The argument fill could be used to read the table nevertheless:

dat <- read.table("data-info2.txt", header=TRUE)
# but note that this causes new errors:
dat[4:7,]
# that need to fixed manually:
dat[dat$Subject=='s2',]$RT <- dat[dat$Subject=='s2',]$Trial
dat[dat$Subject=='s2',]$Trial <- 1:5
dat[4:7,]

Information lines

In the file data-info3.txt some additional lines with information are added by the (fake) experiment software. Use skip to remove these lines.

dat <- read.table("data-info3.txt", header=TRUE)
# Use skip:
str(dat)
In the helpfile of the function read.table you could information on the different settings.
help(read.table)