Classification models
The Boston Housing dataset was great for regression because the target column took on continuous values without limit. There are many cases when the target column takes on one or two values, such as TRUE
or FALSE
, or possibly a grouping of three or more values, such as RED
, BLUE
, or GREEN
. When the target column may be split into distinct categories, the group of ML models that you should try is referred to as classification.
To make things interesting, let’s load a new dataset used to detect pulsar stars in outer space. Go to https://packt.live/33SD0IM and click on Data Folder. Then, click on HTRU2.zip, as shown:
Figure 11.8 – Dataset directory on the UCI website
The dataset consists of 17,898 potential pulsar stars in space. But what are these pulsars? Pulsar stars rotate very quickly, so they have periodic light patterns. Radio frequency interference and noise, however, are attributes that make pulsars very hard to...