You're reading from Comet for Data Science Enhance your ability to manage and optimize the life cycle of your data science project

Product type Paperback

Published in Aug 2022

Publisher Packt

ISBN-13 9781801814430

Length 402 pages

Edition 1st Edition

Tools

Gitlab

Concepts

Data Science

Author (1):

Angelica Lo Duca

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1 – Getting Started with Comet

2. Chapter 1: An Overview of Comet FREE CHAPTER

3. Chapter 2: Exploratory Data Analysis in Comet

4. Chapter 3: Model Evaluation in Comet

5. Section 2 – A Deep Dive into Comet

6. Chapter 4: Workspaces, Projects, Experiments, and Models

7. Chapter 5: Building a Narrative in Comet

8. Chapter 6: Integrating Comet into DevOps

9. Chapter 7: Extending the GitLab DevOps Platform with Comet

10. Section 3 – Examples and Use Cases

11. Chapter 8: Comet for Machine Learning

12. Chapter 9: Comet for Natural Language Processing

13. Chapter 10: Comet for Deep Learning

14. Chapter 11: Comet for Time Series Analysis

15. Other Books You May Enjoy

Second use case – simple linear regression

The objective of this example is to show how to log a metric in Comet. In detail, we set up an experiment that calculates the different values of Mean Squared Error (MSE) produced by fitting a linear regression model with different training sets. Every training set is derived from the same original dataset, by specifying a different seed.

You can download the full code of this example from the official GitHub repository of the book, available at the following link: https://github.com/PacktPublishing/Comet-for-Data-Science/tree/main/01.

We use the scikit-learn Python package to implement a linear regression model. For this use case, we will use the diabetes dataset, provided by scikit-learn.

We organize the experiment in three steps:

Initialize the context.
Define, fit, and evaluate the model.
Show results in Comet.

Let's start with the first step: initializing the context.

Initializing the context

Firstly, we create a .comet.config file in our working directory, as explained in the Getting started with workspaces, projects, experiments, and panels, in the Experiment section.

As the first statement of our code, we create a Comet experiment, as follows:

from comet-ml import Experiment

experiment = Experiment()

Now, we load the diabetes dataset, provided by scikit-learn, as follows:

from sklearn.datasets import load_diabetes

 diabetes = load_diabetes()

X = diabetes.data

y = diabetes.target

We load the dataset and stored data and target in X and y variables, respectively.

Our context is now ready, so we can move on to the second part of our experiment: defining, fitting, and evaluating the model.

Defining, fitting, and evaluating the model

Let's start with the imports:

Firstly, we import all the libraries and functions that we will use:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

We import NumPy, which we will use to build the array of different seeds, and other scikit-learn classes and methods used for the modeling phase.

We set the seeds to test, as follows:

n = 100
seed_list = np.arange(0, n+1, step = 5)

We define 20 different seeds, ranging from 0 to 100, with a step of 5. Thus, we have 0, 5, 10, 15 … 95, 100.

For each seed, we extract a different training and test set, which we use to train the same model, calculate the MSE, and log it in Comet, using the following code:

for seed in seed_list:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)
    model = LinearRegression()
    model.fit(X_train,y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test,y_pred)
    experiment.log_metric("MSE", mse, step=seed)

In the previous code, firstly we split the dataset into training and test sets, using the current seed. We reserve 20% of the data for the test set and the remaining 80% for the training set. Then, we build the linear regression model, we fit it (model.fit()), and we predict the next values for the test set (model.predict()). We also calculate MSE through the mean_squared_error() function. Finally, we log the metric in Comet through the log_metric() method. The log_metric() method receives the metric name as the first parameter and the metric value as the second parameter. We also specify the step for the metric that corresponds to the seed, in our case.

Now, we can launch the code and see the results in Comet.

Showing results in Comet

To see the results in Comet, we perform the following steps:

Open your Comet project.
Select Experiments from the top menu.
The dashboard shows a list of all the experiments. In our case, there is just one experiment.
Click on the experiment name, in the first cell on the left. Since we have not set the experiment name, Comet has set the experiment name for us. Comet shows a dashboard with all the experiment details.

In our experiment, we can see some results in three different sections: Charts, Metrics, and System Metrics.

Under the Charts section, Comet shows all the graphs produced by our code. In our case, there is only one graph referring to MSE, as shown in the following figure:

Figure 1.18 – The value of MSE for different seeds provided as input to train_test_split()

The figure shows how the value of MSE depends on the seed value provided as input to the train_test_split() function. The produced graph is interactive, so we can view every single value in the trend line. We can download all the graphs as .jpeg or .svg files. In addition, we can download as a .json file of the data that has generated the graph. The following piece of code shows the generated JSON:

[{"x": [0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95],

"y":[3424.3166882137334,2981.5854714667616,2911.8279516891607,2880.7211416534115,3461.6357411723743,2909.472185919797,3287.490246176432,3115.203798301772,4189.681600195272,2374.3310824431856,2650.9384531241985,2702.2483323059314,3257.2142019316807,3776.092087838954,3393.8576719100192,2485.7719017765257,2904.0610865479025,3449.620077951196,3000.56755060663,4065.6795384526854],

"type":"scattergl",

"name":"MSE"}]

The previous code shows that there are two variables, x and y, and that the type of graph is scattergl.

Under the Metrics section, Comet shows a table with all the logged metrics, as shown in the following figure:

Figure 1.19 – The Metrics section in Comet

For each metric, Comet shows the last, minimum, and maximum values, as well as the step that determined those values.

Finally, under the System Metrics section, Comet shows some metrics about the system conditions, including memory usage and CPU utilization, as well as the Python version and the type of operating system, as shown in the following figure:

Figure 1.20 – System Metrics in Comet

The System Metrics section shows two graphs, one for memory usage (on the left in the figure) and the other for the CPU utilization (on the right in the figure). Under the graph, the System Metrics section also shows a table with other useful information regarding the machine that generated the experiment.