In this section, we will see how we can oversample or undersample different aspects of the outcome variable to improve our accuracy. We will change our dataset to see this. Refer to the Loan dataset provided with the GitHub link of this book.
Balancing data
The need for balancing data
To demonstrate this, we will use a different dataset. Select the Var. File node on the canvas. Navigate to where the file is located by clicking the triple dots beside the file field. Then select the Loan dataset:
Go to the Types tab and change the Loan predictor's Role to Target. This is the variable that we will predict:
Click on Read Values. Then, click on OK. In this example, we are predicting whether or not people have a loan.
Let...