A thought-provoking type of data cleaning, which may be a new concept for a data developer, is data transformation. Data transformation is a process where the data scientist actually changes what you might expect to be valid data values through some mathematical operation.
Performing data transformation maps data from an original format into the format expected by an appropriate application or a format more convenient for a particular assumption or purpose. This includes value conversions or translation functions, as well as normalizing of numeric values to conform to the minimum and maximum values.
As we've used R earlier in this chapter, we can see that the syntax of a very simple example of this process is simple. For example, a data scientist may decide to transform a given value to the square root of the value:
data.dat$trans_Y <-sqrt(data.dat$Y)...