Problem 2 – Using Python in biological data analysis
For this particular problem, we'll be using the Breast_cancer_data.csv
file, which can be found on Kaggle (https://www.kaggle.com/nsaravana/breast-cancer?select=breast-cancer.csv). The file has also been uploaded to the book's GitHub repository.
When looking at data, sometimes we want to make comparisons with the data we currently have, or we want to use it for predictions in machine learning. In this case, we're going to look at how we can present another type of plot, the scatterplot, using two specific values of columns in our dataset.
Let's imagine you received this data and already determined that your mean perimeter and mean textures are better predictors than the other values in the columns. Your goal now is to create an algorithm that will analyze the values for those two columns by comparing them using a scatterplot. Our goal is only to get that scatterplot. For additional analysis and machine...