In this chapter, we imported data from the UCI repository. We named the columns (or features), and then put them into a pandas DataFrame. We preprocessed our data and removed the ID column. We also explored the data, so that we would know more about it. We used the describe function, which gave us features such as the mean, the maximum, the minimum, and the different quartiles. We also created some histograms (so that we could understand the distributions of the different features) and a scatterplot matrix (so that we could look for linear relationships between the variables).
We then split our dataset up into a training set and a testing validation set. We implemented some testing parameters, built a KNN classifier and an SVC, and compared their results using a classification report. This consisted of features such as accuracy, overall accuracy, precision, recall, F1 score, and support. Finally, we built our own cell and explored what it would take to actually get a malignant or benign classification.
In the next chapter, you will learn about the detection of diabetes. Stay tuned for more!