Production-Ready Applied Deep Learning

Effective Planning of Deep Learning-Driven Projects

In the first chapter of the book, we would like to introduce what deep learning (DL) is and how DL projects are typically carried out. The chapter begins with an introduction to DL, providing some insight into how it shapes our daily lives. Then, we will shift our focus to DL projects by describing how they are structured. Throughout the chapter, we will put extra emphasis on the first phase, project planning; you will learn key concepts such as the comprehension of business objectives, how to define appropriate evaluation metrics, identification of stakeholders, resource planning, and the difference between a minimum viable product (MVP) and a fully featured product (FFP). By the end of this chapter, you should be able to construct a DL project playbook that clearly describes the goal of the project, milestones, tasks, resource allocation, and its timeline.

In this chapter, we’re going to cover the following main topics:

What is DL?
Understanding the role of DL in our daily lives
Overview of DL projects
Planning a DL project

What is DL?

It has only been a decade since DL emerged but it has rapidly started playing an important role in our daily lives. Within the field of artificial intelligence (AI), a popular set of methods categorized as machine learning (ML) exists. By extracting meaningful patterns from historical data, the goal of ML is to build a model that makes sensible predictions and decisions for newly collected data. DL is an ML technique that exploits artificial neural networks (ANNs) to capture a given pattern. Figure 1.1 presents the key components of the AI evolution that started around 1950s, with Alan Turing conducting discussions about intelligent machines, among other godfathers of the field. While various ML algorithms have been introduced sporadically since the advent of AI, it actually took another decades for the field to bloom. Similarly, it has only been about a decade since DL has became the main stream because many of the algorithms in this field require extensive computational power.

Figure 1.1 – A history of AI

As shown in Figure 1.2, the key advantage of DL comes from ANNs, which enable the automatic selection of necessary features. Similar to the way that human brains are structured, ANNs are also made up of components called neurons. A group of neurons forms a layer and multiple layers are linked together to form a network. This kind of architecture can be understood as multiple steps of nested instructions. As the input data passes through the network, each neuron extracts different information, and the model is trained to select the most relevant features for the given task. Considering the role of neurons as building blocks for pattern recognition, deeper networks generally lead to greater performance, as they learn the details better:

Figure 1.2 – The difference between ML and DL

While typical ML techniques require features to be manually selected, DL learns to select important features on its own. Therefore, it can potentially be adapted to a broader range of problems. However, this advantage does not come for free. In order to train a DL model properly, the datasets for training need to be large and diverse enough, which leads to an increase in training time. Interestingly, graphics processing unit (GPU) has played a major role in reducing the training time. While a central processing unit (CPU) demonstrates its effectiveness in carrying out complex computations with its broader instruction sets, a GPU is more suitable for processing simpler but larger computations with its massive parallelism. By exploiting such an advantage in the matrix multiplications that the DL model heavily depends on, GPU has become a critical component within DL.

As we are still in the early stages of the AI era, chip technology is evolving continuously, and it is not yet clear how these technologies will change in the future. It is worth mentioning that new designs come from start-ups as well as big tech companies. This ongoing race clearly shows that more and more products and services based on AI will be introduced. Considering the growth in the market and job opportunities, we believe that it is a great time to learn about DL.

Things to remember

a. DL is an ML technique that exploits ANNs for pattern recognition.

b. DL is highly flexible because it selects the most relevant features automatically for the given task throughout training.

c. GPUs can speed up DL model training with its massive parallelism.

Now that we understand what DL is at a high level, we will describe how it shapes our daily lives in the next section.

Understanding the role of DL in our daily lives

By exploiting the flexibility of DL, researchers have made remarkable progress in the domains in which traditional ML techniques have shown limited performance (see Figure 1.3). The first flag has been planted in the field of computer vision (CV) for digit recognition and object detection tasks. Then, DL has been adopted for natural language processing (NLP), showing meaningful progress in translation and speech recognition tasks. It also demonstrates its effectiveness in reinforcement learning (RL) as well as generative modeling.

The list of papers linked in the Further reading section in this chapter summarizes popular use cases of DL.

Following diagram shows various applications of DL:

Figure 1.3 – Applications of DL

However, integrating DL into an existing technology infrastructure is not an easy task; difficulties can arise from various aspects, including but not limited to budget, time, as well as talent. Therefore, a thorough understanding of DL has become an essential skill for those who manage DL projects: project managers, technology leads, as well as C-suite executives. Furthermore, the knowledge in this fast-growing field will allow them to find future opportunities in their specific verticals and across the organization, as people working on AI projects actively gather new knowledge to derive innovative and competitive advantages. Overall, an in-depth understanding of DL pipelines and developing production-ready outputs allows managers to execute projects better by effectively avoiding commonly known pitfalls.

Unfortunately, DL projects are not yet in a plug-and-play state. In many cases, they involve extensive research and adjustment phases, which can quickly drain the available resources. Above all, we have noticed that many companies struggle to move from proof of concept to production because critical decisions are made by the few who only have a limited understanding of the project requirements and DL pipelines. With that being said, our book aims to provide a complete picture of a DL project; we will start with project planning, and then discuss how to develop MVPs and FFPs, how to utilize cloud services to scale up, and finally, how to deliver the product to targeted users.

Things to remember

a. DL has been applied to many problems in various fields, including but not limited to CV, NLP, RL, and generative modeling.

b. An in-depth understanding of DL is crucial for those leading DL projects, regardless of their position or background.

c. This book provides a complete picture of a DL project by describing how DL projects are carried out from project planning to deployment.

Next, we will take a closer look at how DL projects are structured.

Overview of DL projects

While DL and other software engineering projects have a lot in common, DL projects emphasize planning, due to the extensive need for resources, mainly coming from the complexity of the models and the high volume of data involved. In general, DL projects can be split into the following phases:

Project planning
Building MVPs
Building FFPs
Deployment and maintenance
Project evaluation

In this section, we provide high-level overviews of these phases. The following sections cover each phase in detail.

Project planning

As the first step, the project lead must clearly define what needs to be achieved by the project and understand groups that can affect or be affected by the project. The evaluation metrics need to be defined and agreed upon, as they will be revisited during project evaluation. Then, the team members group together to discuss individual responsibilities and achieve business objectives using available resources. This process naturally leads to a timeline, an estimate of how long the project would take. Overall, project planning should result in the generation of a document called a playbook, which includes a thorough description of how the project will be carried out and evaluated.

Building minimum viable products

Once the direction is clear for everyone, the next step is to build an MVP, a simplistic version of the target deliverable that showcases the project’s value. Another important aspect of this phase is to understand the project’s difficulties and reject paths with greater risks or less promising outcomes by following the fail fast, fail often philosophy. Therefore, data scientists and engineers typically work with partially sampled datasets in development settings and ignore insignificant optimizations.

Building fully featured products

Once the feasibility of the project has been confirmed by the MVP, it must be packaged into an FFP. This phase aims to polish up the MVP to build a production-ready deliverable with various optimizations. In the case of DL projects, additional data preparation techniques are introduced to improve the quality of input data, or the model pipeline gets augmented slightly for greater model performance. Additionally, the data preparation pipeline and model training pipeline may be migrated to the cloud, exploiting various web services for higher throughput and quality. In this case, the whole pipeline often gets reimplemented using different tools and services. This book focuses on Amazon Web Services (AWS), the most popular web service for handling high volumes of data and expensive computations.

Deployment and maintenance

In many cases, the deployment settings are different from the development settings. Therefore, different sets of tools are often involved when moving an FFP to production. Furthermore, deployment may introduce problems that weren’t visible during development, which mainly arise as a result of limited computational resources. Consequently, many engineers and scientists spend additional time improving the user experience during this phase. Most people believe that deployment is the last step. However, there is one more step: maintenance. The quality of data and model performance needs to be monitored consistently to provide stable services to targeted users.

Project evaluation

In the last phase, project evaluation, the team should revisit the discussions made during project planning to evaluate whether the project has been carried out successfully or not. Furthermore, the details of the project need to be recorded, and potential improvements must be discussed so that the next projects can be achieved more efficiently.

Things to remember

a. The phases within DL projects are split into project planning, building MVPs, building FFPs, deployment and maintenance, and project evaluation.

b. During the project planning phase, the project goal and evaluation metrics are defined, and the team discusses an individual's responsibility, available resources, and the timeline for the project.

c. The purpose of building an MVP is to understand the difficulties of the project and reject paths that pose greater risks or offer less promising outcomes.

d. The FFP is a production-ready deliverable that is an optimized version of the MVP. The data preparation pipeline and model training pipeline may be migrated to the cloud, exploiting various web services for higher throughput and quality.

e. Deployment settings often provide limited computational resources. In this case, the system needs to be tuned to provide stable services to target users.

f. Upon the completion of the project, the team needs to revisit the timeline, assigned responsibilities, and business requirements to evaluate the success of the project.

In the following section, we will walk you through how to plan a DL project properly.

Planning a DL project

Every project starts with planning. Throughout the planning, the purpose of the project needs to be clearly defined, and key members should have a thorough understanding of the available resources that can be allocated to the project. Once team members and stakeholders are identified, the next step is to discuss each individual’s responsibility and create a timeline for the project.

This phase should result in a well-documented project playbook that precisely defines business objectives and how the project will be evaluated. A typical playbook contains an overview of key deliverables, a list of stakeholders, a Gantt chart defining steps and bottlenecks, definitions of responsibilities, timelines, and evaluation criteria. In the case of highly complex projects, following the Project Management Body of Knowledge (PMBOK®) Guide (https://www.pmi.org/pmbok-guide-standards/foundational/pmbok) and considering every knowledge domain (for example, integration management, project scope management, and time management) are strongly recommended. Of course, other project management frameworks exist, such as PRINCE2 (https://www.prince2.com/usa/what-is-prince2), which can provide a good starting point. Once the playbook is constructed, every stakeholder must review and revise it until everyone agrees with the contents.

In real life, many people underestimate the importance of planning. Especially in start-ups, engineers are eager to dive into MVP development and spend minimal time planning. However, it is especially dangerous to do so in the case of DL projects because the training process can quickly drain all the available resources.

Defining goal and evaluation metrics

The very first step of planning is to understand what purpose the project serves. The goal might be developing a new product, improving the performance of an existing service, or saving on operational costs. The motivation of the project naturally helps define the evaluation metrics.

In the case of DL projects, there are two types of evaluation metrics: business-related metrics and model-based metrics. Some examples of business-related metrics are as follows: conversion rate, click-through rate (CTR), lifetime value, user engagement measure, savings in operational cost, return on investment (ROI), and revenue. These are commonly used in advertising, marketing, and product recommendation verticals.

On the other hand, model-based metrics include accuracy, precision, recall, F1-score, rank accuracy metrics, mean absolute error (MAE), mean squared error (MSE), root-mean-square error (RMSE), and normalized mean absolute error (NMAE). In general, tradeoffs can be made between the various metrics. For example, a slight decrease in accuracy may be acceptable if meeting latency requirements is more critical to the project.

Along with the target evaluation metric, which differs from project to project, there are other metrics that are commonly found in most projects. These are due dates and resource usage. The target state must be reached by a certain date using available resources.

The goal and corresponding evaluation metrics need to be fair. If the goal is too hard to achieve, project members can possibly lose motivation. If the metric for the evaluation is not correct, understanding the impact of the project becomes difficult. As a result, it is recommended that the selected evaluation metrics are shared with others and considered fair for everyone.

Figure 1.4 – A sample playbook with the project description section filled out

As shown in Figure 1.4, the first section of a playbook begins with a general description, an estimated complexity of the technical aspects, and a list of required tools and frameworks. Next, it clearly describes the objective of the project and how the project will be evaluated.

Stakeholder identification

In the same way that the term stakeholder is used for a business, a stakeholder for a project refers to a person or group who can affect or be affected by the project. Stakeholders can be grouped into two types, internal and external. Internal stakeholders are those that are directly involved in project executions, while external stakeholders may be outside of the circle, supporting the project execution in an indirect way.

Each stakeholder has a different role within the project. First, we’ll look at internal stakeholders. Internal stakeholders are the main drivers of the project. Therefore, they work closely together to process and analyze data, develop a model, and build deliverables. Table 1.1 lists internal stakeholders that are commonly found in DL projects:

Table 1.1 – Common internal stakeholders for DL projects

On the other hand, external stakeholders often play supportive roles, such as collecting necessary data for the project or providing feedback about the deliverable. In Table 1.2, we describe some common external stakeholders for DL projects:

Table 1.2 – Common external stakeholders for DL projects

Stakeholders are described in the second section of a playbook. As shown in Figure 1.4, the playbook must list stakeholders and their responsibilities in the project.

Task organization

A milestone refers to a point in a project where a significant event occurs. Therefore, there is a set of requirements leading up to a milestone. Once the requirements are met, a milestone can be claimed to have been reached. One of the most important steps in project planning is defining milestones and their associated tasks. The ordering of tasks that lead to the goal is called the critical path. It is worth mentioning that tasks don’t need to be tackled sequentially all the time. The understanding of a critical path is important because it allows the team to prioritize tasks appropriately to ensure the success of the project.

In this step, it is also critical to identify level-of-effort (LOE) activities and supportive activities, which are required for project execution. In the case of software development projects, LOE activities include supplementary tasks, such as setting up Git repositories or reviewing others’ merge requests. The following figure (Figure 1.5) describes a typical critical path for a DL project. It will be structured differently if the underlying project consists of different tasks, requirements, technologies, and desired levels of detail:

Figure 1.5 – A typical critical path for a DL project

Resource allocation

For a DL project, there are two main resources that require explicit resource allocations: human and computational resources. Human resources refer to employees that will actively work on individual tasks. In general, they hold positions in data engineering, data science, DevOps, or software engineering. When people talk about human resources, they often consider headcount only. However, the knowledge and skills that individuals hold are other critical factors. Human resources are closely related to how fast the project can be carried out.

Computational resources refer to hardware and software resources that are allocated to the project. Unlike typical software engineering projects, such as mobile app development or web page development, DL projects require heavy computation and large amounts of data. Common techniques for speeding up the development process involve parallelism or using computationally stronger machines. In some cases, tradeoffs need to be made between them, as a single machine of high computational power can cost more than multiple machines of low computational power.

Overall, novel DL pipelines take advantage of flexible and stateless resources, such as AWS Spot instances with fault-tolerant code. Besides hardware resources, there are frameworks and services that may provide necessary features out of the box. If the necessary service requires a payment, it is important to understand how it can change the project execution and what the demand on human resources would be if the team decided to handle it in-house.

In this step, available resources need to be allocated to each task. Figure 1.6 lists the tasks described in the previous section and describes the allocated resources, along with estimates of operational costs. Each task has its own risk level indicator. It is designed for a small team of three people working on a simple DL project with limited computational resources on a couple of AWS Elastic Compute Cloud (EC2) instances for around 4 to 6 months. Please note that the cost estimation of human resources is not included in the example, as it differs a lot depending on geographic location and team seniority:

Figure 1.6 – A sample resource allocation section of a DL project

Before we move on to the next step, we would like to mention that it is important to set aside a portion of the resources as a backup, in case the milestone requires more resources than that have been allocated.

Defining a timeline

Now that we know the available resources, we should be able to construct a timeline for the project. In this step, the team needs to discuss how long each step would take to complete. It is worth mentioning that things don’t work out as planned all the time. There will be many difficulties throughout the project that can delay the delivery of the final product.

Therefore, including buffers within the timeline is a common practice in many organizations. It is important that every stakeholder agrees with the timeline. If anyone believes that it’s not reasonable, the adjustment needs to be made right away. Figure 1.7 is a sample Gantt chart with the most likely estimated timeline for the information presented in Figure 1.6:

Figure 1.7 – A sample Gantt chart describing the timeline

It is worth mentioning that the chart can also be used to monitor the progress of each task and the overall project. In such cases, additional comments or visualizations summarizing the progress can be attached to each indicator bar.

Managing a project

Another important aspect of a DL project that needs to be discussed during the project planning phase is the process that the team will follow to update other team members and ensure on-time delivery of the project. Out of various project management methodologies, Agile fits perfectly, as it helps to split work into smaller parts and quickly iterate over development cycles until the FFP emerges. As DL projects are generally considered highly complex, it is more convenient to work within short cycles of research, development, and optimization phases. At the end of each cycle, stakeholders review results and adjust their long-term goals. Agile methodology is particularly suitable for small teams of experienced individuals. In a typical setting, 2-week sprints are found to be the most effective, especially when the short-term goals are clearly defined.

During a sprint meeting, the team reviews goals from the last sprint and defines goals for the upcoming sprint. It is also recommended to have short daily meetings to go over work performed on the previous day and plan for the upcoming day, as this process can help the team to quickly recognize bottlenecks and adjust their priorities as necessary. Commonly used tools for this process are Jira, Asana, and Quickbase. The majority of the aforementioned tools also support budget management, timeline reviewing, idea management, and resource tracking.

Things to remember

a. Project planning should result in a playbook that clearly describes what purpose the project serves and how the team will move together to reach a particular goal state.

b. The first step of project planning is to define a goal and its corresponding evaluation metrics. In the case of DL projects, there are two types of evaluation metrics: business-related metrics and model-based metrics.

c. A stakeholder refers to a person or a group who can affect or be affected by the project. Stakeholders can be grouped into two types: internal and external.

d. The next stage of project planning is task organization. The team needs to identify milestones, identify tasks, along with LOE activities, and understand the critical path.

e. For DL projects, there are two main resources that require explicit resource allocation: human and computational resources. During resource allocation, it is important to put aside a portion of the resources as a backup.

f. When estimating the timeline for the project, it must be shared within the team, and every stakeholder must agree with the schedule.

g. Agile methodology is a perfect fit for managing DL projects, as it helps to split work into smaller parts and quickly iterate over development cycles.

Filter reviews by

All

Amazon verified reviews

Dror Oct 29, 2022

This is a wonderful and rather unique book on deep learning (DL) in production. In contrast to most available books on machine learning (ML) that cover mostly theory and/or model training, this book focuses on real-world aspects of model deployment and DL pipelines in production. It covers a variety of important (and often neglected) topics, including data preparation, model management and experiment tracking (with W&B and DVC), as well as production-related topics such as model deployment and monitoring in cloud (AWS) and mobile (iOS and Android) environments. Deep learning frameworks covered include both PyTorch and TensorFlow.I can imagine two main audiences that will benefit the most from this book: software engineers that will learn how to apply their knowledge to build AI-focused applications, and machine learning practitioners and data scientists that will learn what it takes to productionize their models and turn them into customer-facing applications.While this book should probably not be your first experience with DL, if you already have some knowledge on DL or are a software engineer focusing on DL in production, this book will take your knowledge to the next level and make you a better DL practitioner, well-versed in the different aspects of DL in production.

Amazon Verified review

Yiqiao Yin Oct 09, 2022

Machine learning engineers, deep learning specialists, and data engineers encounter various problems when moving deep learning models to a production environment. The main objective of this book is to close the gap between theory and applications by providing a thorough explanation of how to transform various models for deployment and efficiently distribute them with a full understanding of the alternatives.✅Learn how to construct complex deep learning models in PyTorch and TensorFlow✅Acquire the knowledge you need to transform your models from one framework to the other and learn how to tailor them for specific requirements that deployment environments introduce✅Get hands-on experience with commonly used deep learning frameworks and popular cloud services

Steven Fernandes Oct 18, 2022

The book is a good introduction book for learning the production steps of deep learning. The book helps a beginner to understand various available frameworks for deploying deep learning models. After introducing the initial basic concepts, the end of part 1 introduces well-known deep learning project tracking using Weights & Biases, MLflow and DVC.Part 2 covers data preparation and model training using Horovod, Ray, Kubeflow, and Sagemaker. The explainable AI section at the end of part 2 could have been presented better. Part 3, deployment and maintenance, helps beginners to get an overall idea of Open Neural Network Exchange (ONNX), and Elastic Kubernetes Services. However, certain sections of the chapter could have been better. For example, chapter 11, Deep Learning on Mobile Devices, doesn't explain the detailed steps to deploy the model on iOS and Android apps. The GitHub section for chapter 11 are links that redirect us to TensorFlow lite. Overall a good introduction book to know how production can be done of deep learning models.

Anup Sep 27, 2022

It's is good for data/ML engineers to quickly learn how to productionize and support DL pipelines.

Bill C Richmond Dec 23, 2022

Looking at the book’s table of contents should tell you if this is the book for you or not. I find many books today focus on theory rather than getting your hands dirty. This one is the later. With loads of links, supplemental material on GitHub, code examples, and explanations, this book is for both beginners and experts. The focus on relevant tools (AWS, PyTorch, Tensorflow, SageMaker, W&B, MLFlow, Kubernetes, ONNX, etc.) is really good. My team uses most of these tools, have to convert between frameworks, use MLOps pipelines, etc., and the authors’ explanation was spot on. It’s a solid read but also a useful reference. Overall, one of the best books I’ve seen on the topic and highly recommended.

Production-Ready Applied Deep Learning: Learn how to construct and deploy complex models in PyTorch and TensorFlow deep learning frameworks

What do you get with a Packt Subscription?

Production-Ready Applied Deep Learning

Effective Planning of Deep Learning-Driven Projects

Technical requirements

What is DL?

Understanding the role of DL in our daily lives

Overview of DL projects

Project planning

Building minimum viable products

Building fully featured products

Deployment and maintenance

Project evaluation

Planning a DL project

Defining goal and evaluation metrics

Stakeholder identification

Task organization

Resource allocation

Defining a timeline

Managing a project

Summary

Further reading

Page 1 of 8

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs

Production-Ready Applied Deep Learning: Learn how to construct and deploy complex models in PyTorch and TensorFlow deep learning frameworks

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs