In fixed-width formatted files, columns have fixed widths; if a data element does not use up the entire allotted column width, then the element is padded with spaces to make up the specified width. To read fixed-width text files, specify the columns either by column widths or by starting positions.
Reading data from fixed-width formatted files
Getting ready
Download the files for this chapter and store the student-fwf.txt file in your R working directory.
How to do it...
Read the fixed-width formatted file as follows:
> student <- read.fwf("student-fwf.txt", widths=c(4,15,20,15,4), col.names=c("id","name","email","major","year"))
How it works...
In the student-fwf.txt file, the first column occupies 4 character positions, the second 15, and so on. The c(4,15,20,15,4) expression specifies the widths of the 5 columns in the data file.
We can use the optional col.names argument to supply our own variable names.
There's more...
The read.fwf() function has several optional arguments that come in handy. We discuss a few of these, as follows:
Files with headers
Files with headers use the following command:
> student <- read.fwf("student-fwf-header.txt", widths=c(4,15,20,15,4), header=TRUE, sep="t",skip=2)
If header=TRUE, the first row of the file is interpreted as having the column headers. Column headers, if present, need to be separated by the specified sep argument. The sep argument only applies to the header row.
The skip argument denotes the number of lines to skip; in this recipe, the first two lines are skipped.
Excluding columns from data
To exclude a column, make the column width negative. Thus, to exclude the email column, we will specify its width as -20 and also remove the column name from the col.names vector, as follows:
> student <- read.fwf("student-fwf.txt",widths=c(4,15,-20,15,4), col.names=c("id","name","major","year"))