Practice building and evaluating machine learning models in scikit-learn with the following exercises:
- Build a clustering model to distinguish between red and white wine by their chemical properties:
- Combine the red and white wine datasets (data/winequality-red.csv and data/winequality-white.csv, respectively) and add a column for the kind of wine (red or white).
- Perform some initial EDA.
- Build and fit a pipeline that scales the data and then uses k-means clustering to make two clusters. Be sure not to use the quality column.
- Use the Fowlkes Mallows Index (the fowlkes_mallows_score() function is in sklearn.metrics) to evaluate how well k-means is able to make the distinction between red and white wine.
- Find the center of each cluster.
- Predict star temperature:
- Using the data/stars.csv file, build a linear regression model of all the numeric columns to predict the...