In this section, we'll look at how individual variables behave—what kind of values they take, what the distribution across those values is, and how those distributions can be represented visually.
Target Variable
The target variable can either have values that are continuous (in the case of a regression problem) or discrete (as in the case of a classification problem). The problem statement we're looking at in this chapter involves predicting whether an earthquake caused a tsunami, that is, the flag_tsunami
variable, which takes on two discrete values only—making it a classification problem.
One way of visualizing how many earthquakes resulted in tsunamis and how many didn't involves the use of a bar chart, where each bar represents a single discrete value of the variable, and the height of the bars is equal to the count of the data points having the corresponding discrete value. This gives us a good comparison of the absolute...