Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Agile Machine Learning with DataRobot

You're reading from   Agile Machine Learning with DataRobot Automate each step of the machine learning life cycle, from understanding problems to delivering value

Arrow left icon
Product type Paperback
Published in Dec 2021
Publisher Packt
ISBN-13 9781801076807
Length 344 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Bipin Chadha Bipin Chadha
Author Profile Icon Bipin Chadha
Bipin Chadha
Sylvester Juwe Sylvester Juwe
Author Profile Icon Sylvester Juwe
Sylvester Juwe
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Section 1: Foundations
2. Chapter 1: What Is DataRobot and Why You Need It? FREE CHAPTER 3. Chapter 2: Machine Learning Basics 4. Chapter 3: Understanding and Defining Business Problems 5. Section 2: Full ML Life Cycle with DataRobot: Concept to Value
6. Chapter 4: Preparing Data for DataRobot 7. Chapter 5: Exploratory Data Analysis with DataRobot 8. Chapter 6: Model Building with DataRobot 9. Chapter 7: Model Understanding and Explainability 10. Chapter 8: Model Scoring and Deployment 11. Section 3: Advanced Topics
12. Chapter 9: Forecasting and Time Series Modeling 13. Chapter 10: Recommender Systems 14. Chapter 11: Working with Geospatial Data, NLP, and Image Processing 15. Chapter 12: DataRobot Python API 16. Chapter 13: Model Governance and MLOps 17. Chapter 14: Conclusion 18. Other Books You May Enjoy

Challenges associated with data science

It is no secret that getting value from data science projects is hard, and many projects end in failure. While some of the reasons are common to any type of project, there are some unique challenges associated with data science projects. Data science is still a relatively young and immature discipline and therefore suffers from problems that any emerging discipline encounters. Data science practitioners can learn from other mature disciplines to avoid some of the mistakes that others have learned to avoid. Let's review some of the key issues that make data science projects challenging:

  • Lack of good-quality data: This is a common refrain, but this is a problem that is not likely to go away anytime soon. The key reason is that most organizations are used to collecting data for reporting. This tends to be aggregate, success-oriented information. Data needed for building models, on the other hand, needs to be detailed and should capture all outcomes. Many organizations invest heavily in data and data warehouses in response to the need for data; the mistake they make is collecting it from the perspective of reporting rather than modeling. Hence, even after all the time and costs spent, they end up in a place where enough useable data is not available. This leads to frustration in senior leadership as to why their teams cannot make use of these large data warehouses built at enormous expense. Taking some time in developing a systemic understanding of the business can help mitigate this problem, as discussed in the following chapters.
  • Explosion of data: Data is being generated and collected on an exponential scale. As more data is collected, the scale of the data makes it harder to be analyzed and understood through traditional reporting methods. New data also spawns new use cases that were previously not possible. The scaling of data also increases noise. This makes it increasingly difficult to extract meaningful insights with traditional methods.
  • Shortage of experienced data scientists: This is another topic that gets a lot of press. The reason for the shortage is that it is a relatively new field where techniques and methods are still rapidly evolving. Another factor is that data science is a multi-disciplinary field that requires expertise in multiple areas, such as statistics, computer science, and business, as well as knowledge of the domain where it is to be applied. Most of the talent pool today is relatively inexperienced and therefore most data scientists have not had a chance to work on a variety of use cases with a broad range of methods and data types. Best practices are still evolving and are not in widespread use. As more and more jobs become data-driven, it will also become important for a broad range of employees to become data-savvy.
  • Immature tools and environments: Most of the tools and environments being used are relatively immature, and that makes it difficult to efficiently build and deploy models. Most of a data scientist's time is spent wrestling with data and infrastructure issues, which limits the time spent understanding the business problem and evaluating the business and ethical implications of models. This in turn increases the odds of failure to produce lasting business value.
  • Black box models: As the complexity of models rises, our ability to understand what they are doing goes down. This lack of transparency creates many problems and can lead to models producing nonsensical results or, at worst, dangerous results. To make matters worse, these models tend to have better accuracy on training and validation datasets. Black box models tend to be difficult to explain to stakeholders and are therefore less likely to be adopted by users.
  • Bias and fairness: The issue of ML models being biased and unfair has been raised recently and it is a key concern for anyone looking to develop and deploy ML models. The biases can creep into the models via biased data, biased processes, or even biased decision-making using model results. The use of black box models makes this problem much harder to track and manage. Bias and fairness are hard to detect but will be increasingly important not only for an organization's reputation but also with regard to the regulatory or legal problems that they can create.

Before we discuss how to address these challenges, we need to introduce you to DataRobot because, as you might have guessed, DataRobot helps in addressing many of these challenges.

You have been reading a chapter from
Agile Machine Learning with DataRobot
Published in: Dec 2021
Publisher: Packt
ISBN-13: 9781801076807
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime