The quality and speed of the ML algorithm training process depends on the quality of the input data. While many algorithms are robust to irrelevant columns and data that is not normalized, some are not. For example, many models requires data inputs to be normalized to lie between 0 and 1. In this section, we will look at some quick and easy ways to preprocess data with Gota. For these examples, we will be using a dataset containing 1,035 records of the height (inch) and weight (lbs) of major league baseball players[17]. The dataset, as described on the UCLA website, consists of the following features:
- Name: Player name
- Team: The baseball team that the player was a member of
- Position: The player's position
- Height (inches): Player height
- Weight (pounds): Player weight in pounds
- Age: Player age at the time of recording
For the...