Distance computations play a big role in many data analytics techniques. We know that variables with higher values tend to dominate distance computations and you may want to rescale the values to be in the range of 0 - 1.
Rescaling a variable to specified min-max range
Getting ready
Install the scales package and read the data-conversion.csv file from the book's data for this chapter into your R environment's working directory:
> install.packages("scales")
> library(scales)
> students <- read.csv("data-conversion.csv")
How to do it...
To rescale the Income variable to the range [0,1], use the following code snippet:
> students$Income.rescaled <- rescale(students$Income)
How it works...
By default, the rescale() function makes the lowest value(s) zero and the highest value(s) one. It rescales all the other values proportionately. The following two expressions provide identical results:
> rescale(students$Income)
> (students$Income - min(students$Income)) / (max(students$Income) - min(students$Income))
To rescale a different range than [0,1], use the to argument. The following snippet rescales students$Income to the range (0,100):
> rescale(students$Income, to = c(1, 100))
There's more...
When using distance-based techniques, you may need to rescale several variables. You may find it tedious to scale one variable at a time.
Rescaling many variables at once
Use the following function to rescale variables:
rescale.many <- function(dat, column.nos) {
nms <- names(dat)
for(col in column.nos) {
name <- paste(nms[col],".rescaled", sep = "")
dat[name] <- rescale(dat[,col])
}
cat(paste("Rescaled ", length(column.nos), " variable(s)n"))
dat
}
With the preceding function defined, we can do the following to rescale the first and fourth variables in the data frame:
> rescale.many(students, c(1,4))
See also
- The Normalizing or standardizing data in a data frame recipe in this chapter.