Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Comet for Data Science

You're reading from   Comet for Data Science Enhance your ability to manage and optimize the life cycle of your data science project

Arrow left icon
Product type Paperback
Published in Aug 2022
Publisher Packt
ISBN-13 9781801814430
Length 402 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Angelica Lo Duca Angelica Lo Duca
Author Profile Icon Angelica Lo Duca
Angelica Lo Duca
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Section 1 – Getting Started with Comet
2. Chapter 1: An Overview of Comet FREE CHAPTER 3. Chapter 2: Exploratory Data Analysis in Comet 4. Chapter 3: Model Evaluation in Comet 5. Section 2 – A Deep Dive into Comet
6. Chapter 4: Workspaces, Projects, Experiments, and Models 7. Chapter 5: Building a Narrative in Comet 8. Chapter 6: Integrating Comet into DevOps 9. Chapter 7: Extending the GitLab DevOps Platform with Comet 10. Section 3 – Examples and Use Cases
11. Chapter 8: Comet for Machine Learning 12. Chapter 9: Comet for Natural Language Processing 13. Chapter 10: Comet for Deep Learning 14. Chapter 11: Comet for Time Series Analysis 15. Other Books You May Enjoy

Motivation, purpose, and first access to the Comet platform

Comet is a cloud-based and self-hosted platform that provides many tools and features to track, compare, describe, and optimize data science experiments and models, from the beginning up to the final monitoring of a data science project life cycle.

In this section, we will describe the following:

  • Motivation – why and when to use Comet
  • Purpose – what Comet can be used for and what it is not suitable for
  • First access to the Comet platform – a quick-start guide to access the Comet platform

Now, we can start learning more about Comet, starting with the motivation.

Motivation

Typically, a data science project life cycle involves the following steps:

  1. Understanding the problem – Define the problem to be investigated and understand which types of data are needed. This step is crucial, since a misinterpretation of data may produce the wrong results.
  2. Data collection – All the strategies used to collect and extract data related to the defined problem. If data is already provided by a company or stakeholder, it could also be useful to search for other data that could help to better model the problem.
  3. Data wrangling – All the algorithms and strategies used to clean and filter data. The use of Exploratory Data Analysis (EDA) techniques could be used to get an idea of data shape.
  4. Feature engineering – The set of techniques used to extract from data the input features that will be used to model the problem.
  5. Data modeling – All the algorithms implemented to model data, in order to extract predictions and future trends. Typically, data modeling includes machine learning, deep learning, text analytics, and time series analysis techniques.
  6. Model evaluation – The set of strategies used to measure and test the performance of the implemented model. Depending on the defined problem, different metrics should be calculated.
  7. Model deployment – When the model reaches good performance and passes all the tests, it can be moved to production. Model deployment includes all the techniques used to make the model ready to be used with real and unseen data.
  8. Model monitoring – A model could become obsolete; thus, it should be monitored to check whether there is performance degradation. If this is the case, the model should be updated with fresh data.

We can use Comet to organize, track, save, and make secure almost all the steps of a data science project life cycle, as shown in the following figure. The steps where Comet can be used are highlighted in green rectangles:

Figure 1.1 – The steps in a data science project life cycle, highlighting where Comet is involved in green rectangles

Figure 1.1 – The steps in a data science project life cycle, highlighting where Comet is involved in green rectangles

The steps involved include the following:

  • Data wrangling – thanks to the integration with some popular libraries for data visualization, such as the matplotlib, plotly, and PIL Python libraries, we can build panels in Comet to perform EDA, which can be used as a preliminary step for data wrangling. We will describe the concept of a panel in more detail in the next sections and chapters of this book.
  • Feature engineering – Comet provides an easy way to track different experiments, which can be compared to select the best input feature sets.
  • Data modeling – Comet can be used to debug your models, as well as performing hyperparameter tuning, thanks to the concept of Optimizer. We will illustrate how to work with Comet Optimizer in the next chapters of this book.
  • Model evaluation – Comet provides different tools to evaluate a model, including panels, evaluation metrics extracted from each experiment, and the possibility to compare different experiments.
  • Model monitoring – Once a model has been deployed, you can continue to track it in Comet with the previously described tools. Comet also provides an external service, named Model Production Monitoring (MPM), that permits us to monitor the performance of a model in real time. The MPM service is not included in the Comet free plan.

We cannot exploit Comet directly to deploy a model. However, we can easily integrate Comet with GitLab, one of the most famous DevOps platforms. We will discuss the integration between Comet and GitLab in Chapter 7, Extending the GitLab DevOps Platform with Comet.

To summarize, Comet provides a single point of access to almost all the steps in a data science project, thanks to the different tools and features provided. With respect to a traditional and manual pipeline, Comet permits automating and reducing error propagation during the whole data science process.

Now that you are familiar with why and when to use Comet, we can move on to looking at the purpose of Comet.

Purpose

The main objective of Comet is to provide users with a platform where they can do the following:

  • Organize your project into different experiments – This is useful when you want to try different strategies or algorithms or produce different models.
  • Track, reproduce, and store experiments – Comet assigns to each experiment a unique identifier; thus, you can track every single change in your code without worrying about recording the changes you make. In fact, Comet also stores the code used to run each experiment.
  • Share your projects and experiments with other collaborators – You can invite other members of your team to read or modify your experiments, thus making it easy to extract insights from data or to choose the best model for a given problem.

Now that you have learned about the purpose of Comet, we can illustrate how to access the Comet platform for the first time.

First access to the Comet platform

Using Comet requires the creation of an account on the platform. The Comet platform is available at this link: https://www.comet.ml/. Comet provides different plans that depend on your needs. In the free version, you can have access to almost all the features, but you cannot share your projects with your collaborators.

If you are an academic, you can create a premium Comet account for free, by following the procedure for academics: https://www.comet.ml/signup?plan=academic. In this case, you must provide your academic account.

You can create a free account simply by clicking on the Create a Free Account button and following the procedure.

You have been reading a chapter from
Comet for Data Science
Published in: Aug 2022
Publisher: Packt
ISBN-13: 9781801814430
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime