Exercises
Practice building and evaluating machine learning models in scikit-learn
with the following exercises:
- Build a clustering model to distinguish between red and white wine by their chemical properties:
a) Combine the red and white wine datasets (
data/winequality-red.csv
anddata/winequality-white.csv
, respectively) and add a column for the kind of wine (red or white).b) Perform some initial EDA.
c) Build and fit a pipeline that scales the data and then uses k-means clustering to make two clusters. Be sure not to use the
quality
column.d) Use the Fowlkes-Mallows Index (the
fowlkes_mallows_score()
function is insklearn.metrics
) to evaluate how well k-means is able to make the distinction between red and white wine.e) Find the center of each cluster.
- Predict star temperature:
a) Using the
data/stars.csv
file, perform some initial EDA and then build a linear regression model of all the numeric columns to predict the temperature of the star.b) Train the model on 75% of...