Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
The Deep Learning Architect's Handbook
The Deep Learning Architect's Handbook

The Deep Learning Architect's Handbook: Build and deploy production-ready DL solutions leveraging the latest Python techniques

eBook
$37.99 $42.99
Paperback
$52.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Colour book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

The Deep Learning Architect's Handbook

Deep Learning Life Cycle

In this chapter, we will explore the intricacies of the deep learning life cycle. Sharing similar characteristics to the machine learning life cycle, the deep learning life cycle is a framework as much as it is a methodology that will allow a deep learning project idea to be insanely successful or to be completely scrapped when it is appropriate. We will grasp the reasons why the process is cyclical and understand some of the life cycle’s initial processes on a deeper level. Additionally, we will go through some high-level sneak peeks of the later processes of the life cycle that will be explored at a deeper level in future chapters.

Comprehensively, this chapter will help you do the following:

  • Understand the similarities and differences between the deep learning life cycle and its machine learning life cycle counterpart
  • Understand where domain knowledge fits in a deep learning project
  • Understand the few key steps in planning a deep learning project to make sure it can tangibly create real-world value
  • Grasp some deep learning model development details at a high level
  • Grasp the importance of model interpretation and the variety of deep learning interpretation techniques at a high level
  • Explore high-level concepts of model deployments and their governance
  • Learn to choose the necessary tools to carry out the processes in the deep learning life cycle

We’ll cover this material in the following sections:

  • Machine learning life cycle
  • The construction strategy of a deep learning life cycle
  • The data preparation stage
  • Deep learning model development
  • Delivering model insights
  • Managing risks

Technical requirements

This chapter includes some practical implementations in the Python programming language. To complete it, you need to have a computer with the following libraries installed:

  • pandas
  • matplotlib
  • seaborn
  • tqdm
  • lingua

The code files are available on GitHub: https://github.com/PacktPublishing/The-Deep-Learning-Architect-Handbook/tree/main/CHAPTER_1.

Understanding the machine learning life cycle

Deep learning is a subset of the wider machine learning category. The main characteristic that sets it apart from other machine learning algorithms is the foundational building block called neural networks. As deep learning has advanced tremendously since the early 2000s, it has made many previously unachievable feats possible through its machine learning counterparts. Specifically, deep learning has made breakthroughs in recognizing complex patterns that exist in complex and unstructured data such as text, images, videos, and audio. Some of the successful applications of deep learning today are face recognition with images, speech recognition from audio data, and language translation with textual data.

Machine learning, on the other hand, is a subset of the wider artificial intelligence category. Its algorithms, such as tree-based models and linear models, which are not considered to be deep learning models, still serve a wide range of use cases involving tabular data, which is the bulk of the data that’s stored by small and big organizations alike. This tabular data may exist in multiple structured databases and can span from 1 to 10 years’ worth of historical data that has the potential to be used for building predictive machine learning models. Some of the notable predictive applications for machine learning algorithms are fraud detection in the finance industry, product recommendations in e-commerce, and predictive maintenance in the manufacturing industry. Figure 1.1 shows the relationships between deep learning, machine learning, and artificial intelligence for a clearer visual distinction between them:

Figure 1.1 – Artificial intelligence relationships

Figure 1.1 – Artificial intelligence relationships

Now that we know what deep learning and machine learning are in a nutshell, we are ready for a glimpse of the machine learning life cycle, as shown in Figure 1.2:

Figure 1.2 – Deep learning/machine learning life cycle

Figure 1.2 – Deep learning/machine learning life cycle

As advanced and complex the deep learning algorithm is compared to other machine learning algorithms, the guiding methodologies that are needed to ensure success in both domains are unequivocally the same. The machine learning life cycle involves six stages that interact with each other in different ways:

  1. Planning
  2. Data Preparation
  3. Model Development
  4. Deliver Model Insights
  5. Model Deployment
  6. Model Governance

Figure 1.2 shows these six stages and the possible stage transitions depicted with arrows. Typically, a machine learning project will iterate between stages, depending on the business requirements. In a deep learning project, most of the innovative predictive use cases require manual data collection and data annotation, which is a process that lies in the realm of the Data Preparation stage. As this process is generally time-consuming, especially when the data itself is not readily available, a go-to solution would be to start with an acceptable initial number of data and transition into the Model Development stage and, subsequently, to the Deliver Model Insight stage to make sure results from the ideas are sane.

After the initial validation process, depending again on business requirements, practitioners would then decide to transition back into the Data Preparation stage and continue to iterate through these stages cyclically in different data size milestones until results are satisfactory toward both the model development and business metrics. Once it gets approval from the necessary stakeholders, the project then goes into the Model Deployment stage, where the built machine learning model will be served to allow its predictions to be consumed. The final stage is Model Governance, where practitioners carry out tasks that manage the risk, performance, and reliability of the deployed machine learning model. Model deployment and model governance both deserve more in-depth discussion and will be introduced in separate chapters closer to the end of this book. Whenever any of the key metrics fail to maintain themselves to a certain determined confidence level, the project will fall back into the Data Preparation stage of the cycle and repeat the same flow all over again.

The ideal machine learning project flows through the stages cyclically for as long as the business application needs it. However, machine learning projects are typically susceptible to a high probability of failure. According to a survey conducted by Dimensional Research and Alegion, covering around 300 machine learning practitioners from 20 different business industries, 78% of machine learning projects get held back or delayed at some point before deployment. Additionally, Gartner predicted that 85% of machine learning projects will fail (https://venturebeat.com/2021/06/28/why-most-ai-implementations-fail-and-what-enterprises-can-do-to-beat-the-odds/). By expecting the unexpected, and anticipating failures before they happen, practitioners can likely circumvent potential failure factors early down the line in the planning stage. This also brings us to the trash icon bundled together in Figure 1.2. Proper projects with a good plan typically get discarded only at the Deliver Model Insights stage, when it’s clear that the proposed model and project can’t deliver satisfactory results.

Now that we’ve covered an overview of the machine learning life cycle, let’s dive into each of the stages individually, broken down into sections, to help you discover the key tips and techniques that are needed the complete each stage successfully. These stages will be discussed in an abstract format and are not a concrete depiction of what you should ultimately be doing for your project since all projects are unique and strategies should always be evaluated on a case-by-case basis.

Strategizing the construction of a deep learning system

A deep learning model can only realize real-world value by being part of a system that performs some sort of operation. Bringing deep learning models from research papers to actual real-world usage is not an easy task. Thus, performing proper planning before conducting any project is a more reliable and structured way to achieve the desired goals. This section will discuss some considerations and strategies that will be beneficial when you start to plan your deep learning project toward success.

Starting the journey

Today, deep learning practitioners tend to focus a lot on the algorithmic model-building part of the process. It takes a considerable amount of mental strength to not get hooked on the hype of state-of-the-art (SOTA) research-focused techniques. With crazy techniques such as pixtopix, which is capable of generating high-resolution realistic color images from just sketches or image masks, and natural language processing (NLP) techniques such as GPT-3, a 175-billion parameters text generation model from OpenAI, and GPT-4, a multimodal text generation model that is a successor to GPT-3 and its sub-models, that are capable of generating practically anything you ask it to in a text format that ranges from text summarization to generating code, why wouldn’t they?!

Jokes aside, to become a true deep learning architect, we need to come to a consensus that any successful machine learning or deep learning project starts with the business problem and not from the shiny new research paper you just read online complete with a public GitHub repository. The planning stage often involves many business executives who are not savvy about the details of machine learning algorithms and often, the same set of people wouldn’t care about it at all. These algorithms are daunting for business-focused stakeholders to understand and, when added on top of the tough mental barriers of the adoption of artificial intelligence technologies itself, it doesn’t make the project any more likely to be adopted.

Evaluating deep learning’s worthiness

Deep learning shines the most in handling unstructured data. This includes image data, text data, audio data, and video data. This is largely due to the model’s ability to automatically learn and extract complex, high-level features from the raw data. In the case of images and videos, deep learning models can capture spatial and temporal patterns, recognizing objects, scenes, and activities. With audio data, deep learning can understand the nuances of speech, noise, and various sound elements, making it possible to build applications such as speech recognition, voice assistants, and audio classification systems. For text data, deep learning models can capture the context, semantics, and syntax, enabling NLP tasks such as sentiment analysis, machine translation, and text summarization.

This means that if this data exists and is utilized by your company in its business processes, there may be an opportunity to solve a problem with the help of deep learning. However, never overcomplicate problems just so you can solve them with deep learning. Equating this to something more relatable, you wouldn’t use a huge sledgehammer to get a nail into wood. It could work and you might get away with it, but you’d risk bending the nail or injuring yourself while using it.

Once a problem has been identified, evaluate the business value of solving it. Not all problems are born the same and they can be ranked based on their business impact, value, complexity, risks, costs, and suitability for deep learning. Generally, you’d be looking for high impact, high value, low complexity, low risks, low cost, and high suitability to deep learning. Trade-offs between these metrics are expected but simply put, make sure the problem you’ve discovered is worth solving at all with deep learning. A general rule of thumb is to always resort to a simpler solution for a problem, even if it ends up abandoning the usage of deep learning technologies. Simple approaches tend to be more reliable, less costly, less prone to risks, and faster to fruition.

Consider a problem where a solution is needed to remove background scenes in a video feed and leave only humans or necessary objects untouched so that a more suitable background scene can be overlaid as a background instead. This is a common problem in the professional filmmaking industry in all film genres today.

Semantic segmentation, which is the task of assigning a label to every pixel of an image in the width and height dimensions, is a method that is needed to solve such a problem. In this case, the task needs to assign labels that can help identify which pixels need to be removed. With the advent of many publicly available semantic segmentation datasets, deep learning has been able to advance considerably in the semantic segmentation field, allowing itself to achieve a very satisfactory fine-grained understanding of the world, enough so that it can be applied in the industry of autonomous driving and robot navigation most prominently. However, deep learning is not known to be 100% error-free and almost always has some error, even in the controlled evaluation dataset. In the case of human segmentation, for example, the model would likely result in the most errors in the fine hair areas. Most filmmakers aim for perfect depictions of their films and require that every single pixel gets removed appropriately without fail since a lot of money is spent on the time of the actors hired for the film. Additionally, a lot of time and money would be wasted in manually removing objects that could be otherwise simply removed if the scene had been shot with a green screen. This is an example of a case where we should not overcomplicate the problem. A green screen is all you need to solve the problem described: specifically, the rare chromakey green color. When green screens are prepped properly in the areas where the desired imagery will be overlaid digitally, image processing techniques alone can remove the pixels that are considered to be in the small light intensity range centered on the chromakey green color and achieve semantic segmentation effectively with a rule-based solution. The green screen is a simpler solution that is cost-effective, foolproof, and fast to set up.

That was a mouthful! Now, let’s go through a simpler problem. Consider a problem where we want to automatically and digitally identify when it rains. In this use case, it is important to understand the actual requirements and goals of identifying the rain: is it sufficient to detect rain exactly when it happens? Or do we need to identify whether rain will happen in the near future? What will we use the information of rain events for? These questions will guide whether deep learning is required or not. We, as humans, know that rain can be predicted by visual input by either looking at the presence of raindrops falling or looking at cloud conditions. However, if the use case is sufficient to detect rain when it happens, and the goal of detecting rain is to determine when to water the plants, a simpler approach would be to use an electronic sensor to detect the presence of water or humidity. Only when you want to estimate whether it will rain in the future, let’s say in 15 minutes, does deep learning make more sense to be applied as there are a lot of interactions between meteorological factors that can affect rainfall. Only by brainstorming each use case and analyzing all potential solutions, even outside of deep learning, can you make sure deep learning brings tangible business value compared to other solutions. Do not just apply deep learning because you want to.

At times, when value isn’t clear when you’re directly considering a use case, or when value is clear but you have no idea how to execute it, consider finding reference projects from companies in the same industry. Companies in the same industry have a high chance of wanting to optimize the same processes or solve the same pain points. Similar reference projects can serve as a guide to designing a deep learning system and can serve as proof that the use case being considered is worthy of the involvement of deep learning technologies. Of course, not everybody has access to details like this, but you’d be surprised what Google can tell you these days. Even if there isn’t a similar project being carried out for direct reference, you would likely be able to pivot upon the other machine learning project references that already have a track record of bringing value to the same industry.

Admittedly, rejecting deep learning at times would be a hard pill to swallow considering that most practitioners get paid to implement deep learning solutions. However, dismissing it earlier will allow you to focus your time on more valuable problems that would be more useful to solve with deep learning and prevent the risk of undermining the potential of deep learning in cases where simpler solutions can outperform deep learning. Criteria for deep learning worthiness should be evaluated on a case-by-case basis and as a practitioner, the best advice to follow is to simply practice common sense. Spend a good amount of time going through the problem exploration and the worthiness evaluation process. The last thing you want is to spend a painstaking amount of time preparing data, building a deep learning model, and delivering very convincing model insights only to find out that the label you are trying to predict does not provide enough value for the business to invest further.

Defining success

Ever heard sentences like “My deep learning model just got 99% accuracy on my validation dataset!”? Data scientists often make the mistake of determining the success of a machine learning project just by using validation metrics they use to evaluate their machine learning models during the model development process. Model-building metrics such as accuracy, precision, or recall are important metrics to consider in a machine learning project but unless they add business values and connect to the business objectives in some way, they rarely mean anything. A project can achieve a good accuracy score but still fail to achieve the desired business goals. This can happen in cases when no proper success metrics have been defined early and subsequently cause a wrong label to be used in the data preparation and model development stages. Furthermore, even when the model metric positively impacts business processes directly, there is a chance that the achievement won’t be communicated effectively to business stakeholders and the worst case not considered to be successful when reported as-is.

Success metrics, when defined early, act as the machine learning project’s guardrails and ensure that the project goals are aligned with the business goals. One of the guardrails is that a success metric can help guide the choice of a proper label that can at inference time, tangibly improve the business processes or otherwise create value in the business. First, let’s make sure we are aligned with what a label means, which is a value that you want the machine learning model to predict. The purpose of a machine learning model is to assign these labels automatically given some form of input data, and thus during the data preparation and model development stages, a label needs to be chosen to serve that purpose. Choosing the wrong label can be catastrophic to a deep learning project as sometimes, when data is not readily available, it means the project has to start all over again from the data preparation stage. Labels should always be indirectly or directly attributed to the success metric.

Success metrics, as the name suggests, can be plural, and range from time-based success definitions or milestones to the overall project success, and from intangible to tangible. It’s good practice to generally brainstorm and document all the possible success criteria from a low level to a high level. Another best practice is to make sure to always define tangible success metrics alongside intangible metrics. Intangible metrics generate awareness, but tangible metrics make sure things are measurable and thus make them that much more attainable. A few examples of intangible and hard-to-measure metrics are as follows:

  • Increasing customer satisfaction
  • Increasing employee performance
  • Improving shareholder outlook

Metrics are ways to measure something and are tied to goals to seal the deal. Goals themselves can be intangible, similar to the few examples listed previously, but so long as it is tied to tangible metrics, the project is off to a good start. When you have a clear goal, ask yourself in what way the goal can be proven to be achieved, demonstrated, or measured. A few examples of tangible success metrics for machine learning projects that could align with business goals are as follows:

  • Increase the time customers spend, which can be a proxy for customer delight
  • Increase company revenue, which can be a proxy for employee performance
  • Increase the click-through rate (CTR), which can be a proxy for the effectiveness of targeted marketing campaigns
  • Increase the customer lifetime value (CLTV), which can be a proxy for long-term customer satisfaction and loyalty
  • Increase conversion rate, which can be a proxy for the success of promotional campaigns and website user experience

This concept is not new nor limited to just machine learning projects – just about any single project carried out for a company as every single real-world project needs to be aligned with the business goal. Many foundational project management techniques can be applied similarly to machine learning projects, and spending time gaining some project management skills out of the machine learning field would be beneficial and transferable to machine learning projects. Additionally, as machine learning is considered to be a software-based technology, software project management methodologies also apply.

A final concluding thought to take away is that machine learning systems are not about how advanced your machine learning models are, but instead about how humans and machine intelligence can work together to achieve a greater good and create value.

Planning resources

Deep learning often involves neural network architectures with a large set of parameters, otherwise called weights. These architecture’s sizes can go from holding a few parameters up to holding hundreds of billions of parameters. For example, an OpenAI GPT-3 text generation model holds 175 billion neural network parameters, which amounts to around 350 GB in computer storage size. This means that to run GPT-3, you need a machine with a random access memory (RAM) size of at least 350 GB!

Deep learning model frameworks such as PyTorch and TensorFlow have been built to work with devices called graphics processing units (GPUs), which offer tremendous neural network model training and inference speedups. Off-the-shelf GPU devices commonly have a GPU RAM of 12 GB and are nowhere near the requirements needed to load a GPT-3 model in GPU mode. However, there are still methods to partition big models into multiple GPUs and run the model on GPUs. Additionally, some methods can allow for distributed GPU model training and inference to support larger data batch sizes at any one usage point. GPUs are not considered cheap devices and can cost anywhere from a few hundred bucks to hundreds of thousands from the most widely used GPU brand, Nvidia. With the rise of cryptocurrency technologies, the availability of GPUs is also reduced significantly due to people buying them immediately when they are in stock. All these emphasize the need to plan computing resources for training and inferencing deep learning models beforehand.

It is important to align your model development and deployment needs to your computing resource allocation early in the project. Start by gauging the range of sizes of deep learning architectures that are suitable for the task at hand either by browsing research papers or websites that provide a good summary of techniques, and setting aside computing resources for the model development process.

Tip

paperswithcode.com provides summaries of a wide variety of techniques grouped by a wide variety of tasks!

When computing resources are not readily available, make sure you always make purchase plans early, especially if it involves GPUs. But what if a physical machine is not desired? An alternative to using computing resources is to use paid cloud computing resource providers you can access online easily from anywhere in the world. During the model development stage, one of the benefits of having more GPUs with more RAM allocated is that it can allow you to train models faster by either using a larger data batch size during training or allowing the capability to train multiple models at any one time. It is generally fine to also use CPU-only deep learning model training, but the model training time would just inevitably be much longer.

The GPU and CPU-based computing resources that are required during training are often considered overkill to be used during inference time when they are deployed. Different applications have different deployment computing requirements and the decision on what resource specification to allocate can be gauged by asking yourself the following three questions:

  • How often are the inference requests made?
    • Many inference requests in a short period might signal the need to have more than one inference service up in multiple computing devices in parallel
  • What is the average amount of samples that are requested for a prediction at any one time?
    • Device RAM requirements should match batch size expectations
  • How fast do you need a reply?
    • GPUs are needed if it’s seconds or a faster response time requirement
    • CPUs can do the job if you don’t care about the response time

Resource planning is not restricted to just computing resource planning – it also expands to human resource planning. Assumptions for the number of deep learning engineers and data scientists working together in a team would ultimately affect the choices of software libraries and tools used in the model development process. The analogy of choosing these tools will be introduced in future sections.

The next step is to prepare your data.

Preparing data

Data is to machine learning models as is the fuel to your car, the electricity to your electronic devices, and the food for your body. A machine learning model works by trying to capture the relationships between the provided input and output data. Similar to how human brains work, a machine learning model will attempt to iterate through collected data examples and slowly build a memory of the patterns required to map the provided input data to the provided target output data. The data preparation stage consists of methods and processes required to prepare ready-to-use data to build a machine learning model that includes the following:

  • Acquisition of raw input and targeted output data
  • Exploratory data analysis of the acquired data
  • Data pre-processing

We will discuss each of these topics in the following subsections.

Deep learning problem types

Deep learning can be broadly categorized into two problem types, namely supervised learning and unsupervised learning. Both of these problem types involve building a deep learning model that is capable of making informed predictions as outputs, given well-defined data inputs.

Supervised learning is a problem type where labels are involved that act as the source of truth to learn from. Labels can exist in many forms and can be broken down into two problem types, namely classification and regression. Classification is the process where a specific discrete class is predicted among other classes when given input data. Many more complex problems derive from the base classification problem types, such as instance segmentation, multilabel classification, and object detection. Regression, on the other hand, is the process where a continuous numerical value is predicted when given input data. Likewise, complex problem types can be derived from the base regression problem type, such as multi-regression and image bounding box regression.

Unsupervised learning, on the other hand, is a problem type where there aren’t any labels involved and the goals can vary widely. Anomaly detection, clustering, and feature representation learning are the most common problem types that belong to the unsupervised learning category.

We will go through these two problem types separately for deep learning in Chapter 8, Exploring Supervised Deep Learning, and Chapter 9, Exploring Unsupervised Deep Learning.

Next, let’s learn about the things you should consider when acquiring data.

Acquiring data

Acquiring data in the context of deep learning usually involves unstructured data, which includes image data, video data, text data, and audio data. Sometimes, data can be readily available and stored through some business processes in a database but very often, it has to be collected manually from the environment from scratch. Additionally, very often, labels for this data are not readily available and require manual annotation work. Along with the capability of deep learning algorithms to process and digest highly complex data comes the need to feed it more data compared to its machine learning counterparts. The requirement to perform data collection and data annotation in high volumes is the main reason why deep learning is considered to have a high barrier of entry today.

Don’t rush into choosing an algorithm quickly in a machine learning project. Spend a quality amount of time formally defining the features that can be acquired to predict the target variable. Get help from domain experts during the process and brainstorm potential predictive features that relate to the target variable. In actual projects, it is common to spend a big portion of your time planning and acquiring the data while making sure the acquired data is fit for a machine learning model’s consumption and subsequently spending the rest of the time in model building, model deployment, and model governance. A lot of research has been done into handling bad-quality data during the model development stage but most of these techniques aren’t comprehensive and are limited in ways that they can cover up the inherent quality of the data. Displaying ignorance in quality assurance during the data acquisition stage and showing enthusiasm only in the data science portion of the workflow is a strong indicator that the project would be doomed to failure right from the inception stage.

Formulating a data acquisition strategy is a daunting task when you don’t know what it means to have good-quality data. Let’s go through a few pillars of data quality you should consider for your data in the context of actual business use cases and machine learning:

  • Representativeness: How representative is the data concerning the real-world data population?
  • Consistency: How consistent are the annotation methods? Does the same pattern match the same label or are there some inconsistencies?
  • Comprehensiveness: Are all variations of a specific label covered in the collected dataset?
  • Uniqueness: Does the data contain a lot of duplicated or similar data?
  • Fairness: Is the collected data biased toward any specific labels or data groups?
  • Validity: Does the data contain invalid fields? Do the data inputs match up with their labels? Is there missing data?

Let’s look at each of these in detail.

Representativeness

Data should be collected in a way that it mimics what data you will receive during model deployment as much as possible. Very often in research-based deep learning projects, researchers collect their data in a closed environment with controlled environmental variables. One of the reasons researchers prefer collecting data from a controlled environment is that they can build stabler machine learning models and generally try to prove a point. Eventually, when the research paper is published, you see amazing results that were applied using handpicked data to impress. These models, which are built on controlled data, fail miserably when you apply them to random uncontrolled real-world examples. Don’t get me wrong – it’s great to have these controlled datasets available to contribute toward a stabler machine learning model at times, but having uncontrolled real-world examples as a main part of the training and evaluation datasets is key to achieving a generalizable model.

Sometimes, the acquired training data has an expiry date and does not stay representative forever. This scenario is called data drift and will be discussed in more detail in the Managing risk section closer to the end of this chapter. The representativeness metric for data quality should also be evaluated based on the future expectations of the data the model will receive during deployment.

Consistency

Data labels that are not consistently annotated make it harder for machine learning models to learn from them. This happens when the domain ideologies and annotation strategies differ among multiple labelers and are just not defined properly. For example, “Regular” and “Normal” mean the same thing, but to the machine, it’s two completely different classes; so are “Normal” and “normal” with just a capitalization difference!

Practice formalizing a proper strategy for label annotation during the planning stage before carrying out the actual annotation process. Cleaning the data for simple consistency errors is possible post-data annotation, but some consistency errors can be hard to detect and complex to correct.

Comprehensiveness

Machine learning thrives in building a decisioning mechanism that is robust to multiple variations and views of any specific label. Being capable and accomplishing it are two different things. One of the prerequisites of decisioning robustness is that the data that’s used for training and evaluation itself has to be comprehensive enough to provide coverage for all possible variations of each provided label. How can comprehensiveness be judged? Well, that depends on the complexity of the labels and how varied they can present themselves naturally when the model is deployed. More complex labels naturally require more samples and less complex labels require fewer samples.

A good point to start with, in the context of deep learning, is to have at least 100 samples for each label and experiment with building a model and deriving model insights to see if there are enough samples for the model to generalize on unseen variations of the label. When the model doesn’t produce convincing results, that’s when you need to cycle back to the data preparation stage again to acquire more data variations of any specific label. The machine learning life cycle is inherently a cyclical process where you will experiment, explore, and verify while transitioning between stages to obtain the answers you need to solve your problems, so don’t be afraid to execute these different stages cyclically.

Uniqueness

While having complete and comprehensive data is beneficial to build a machine learning model that is robust to data variations, having duplicated versions of the same data variation in the acquired dataset risks creating a biased model. A biased model makes biased decisions that can be unethical and illegal and sometimes renders such decisions meaningless. Additionally, the amount of data acquired for any specific label is rendered meaningless when all of them are duplicated or very similar to each other.

Machine learning models are generally trained on a subset of the acquired data and then evaluated on other subsets of the data to verify the model’s performance on unseen data. When the part of the dataset that is not unique gets placed in the evaluation partition of the acquired dataset by chance, the model risks reporting scores that are biased against the duplicated data inputs.

Fairness

Does the acquired dataset represent minority groups properly? Is the dataset biased toward the majority groups in the population? There can be many reasons why a machine learning model turns out to be biased, but one of the main causes is data representation bias. Making sure the data is represented fairly and equitably is an ethical responsibility of all machine learning practitioners. There are a lot of types of bias, so this topic will have its own section and will be introduced along with methods of mitigating it in Chapter 13, Exploring Bias and Fairness.

Validity

Are there outliers in the dataset? Is there missing data in the dataset? Did you accidentally add a blank audio or image file to the properly collected and annotated dataset? Is the annotated label for the data input considered a valid label? These are some of the questions you should ask when considering the validity of your dataset.

Invalid data is useless for machine learning models and some of these complicate the pre-processing required for them. The reasons for invalidity can range from simple human errors to complex domain knowledge mistakes. One of the methods to mitigate invalid data is to separate validated and unvalidated data. Include some form of automated or manual data validation process before a data sample gets included in the validated dataset category. Some of this validation logic can be derived from business processes or just common sense. For example, if we are taking age as input data, there are acceptable age ranges and there are age ranges that are just completely impossible, such as 1,000 years old. Having simple guardrails and verifying these values early when collecting them makes it possible to correct them then and there to get accurate and valid data. Otherwise, these data will likely be discarded when it comes to the model-building stage. Maintaining a structured framework to validate data ensures that the majority of the data stays relevant and usable by machine learning models and free from simple mistakes.

As for more complex invalidity, such as errors in the domain ideology, domain experts play a big part in making sure the data stays sane and logical. Always make sure you include domain experts when defining the data inputs and outputs in the discussion about how data should be collected and annotated for model development.

Making sense of data through exploratory data analysis (EDA)

After the acquisition of data, it is crucial to analyze the data to inspect its characteristics, patterns that exist, and the general quality of the data. Knowing the type of data you are dealing with allows you to plan a strategy for the subsequent model-building stage. Plot distribution graphs, calculate statistics, and perform univariate and multivariate analysis to understand the inherent relationships between the data that can help further ensure the validity of the data. The methods of analysis for different variable types are different and can require some form of domain knowledge beforehand. In the following subsections, we will be practically going through exploratory data analysis (EDA) for text-based data to get a sense of the benefits of carrying out an EDA task.

Practical text EDA

In this section, we will be manually exploring and analyzing a text-specific dataset using Python code, with the motive of building a deep learning model later in this book using the same dataset. The dataset we use will predict the categories of an item on an Indian e-commerce website based on its textual description. This use case will be useful to automatically group advertised items for user recommendation usage and can help increase purchasing volume on the e-commerce website:

  1. Let’s start by defining the libraries that we will use in a notebook. We will be using pandas for data manipulation and structuring, matplotlib and seaborn for plotting graphs, tqdm for visualizing iteration progress, and lingua for text language detection:
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from tqdm import tqdm
    from lingua import Language, LanguageDetectorBuilder
    tqdm.pandas()
  2. Next, let’s load the text dataset using pandas:
    dataset = pd.read_csv('ecommerceDataset.csv')
  3. pandas has some convenient functions to visualize and describe the loaded dataset; let’s use them. Let’s start by visualizing three rows of the raw data:
    dataset.head(3)

    This will display the following figure in your notebook:

Figure 1.3 – Visualizing the text dataset samples

Figure 1.3 – Visualizing the text dataset samples

  1. Next, let’s describe the dataset by visualizing the data column-based statistics:
    dataset.describe()

    This will display the following figure in your notebook:

Figure 1.4 – Showing the statistics of the dataset

Figure 1.4 – Showing the statistics of the dataset

  1. With these visualizations, it’s obvious that the description of the dataset aligns with what exists in the dataset, where the category column contains four unique class categories paired with the text description data column named description with evidence showing that they are strings. One important insight from the describing function is that there are duplicates in the text description. We can remove duplicates by taking the first example row among all the duplicates, but we also have to make sure that the duplicates have the same category, so let’s do that:
    for i in tqdm(range(len(unique_description_information))):
         assert(
        len(
          dataset[
            dataset['description'] ==
            unique_description_information.keys()[i]
          ]['category'].unique()
        ) == 1
      )
    dataset.drop_duplicates(subset=['description'], inplace=True)
  2. Let’s check the data types of the columns:
    dataset.dtypes

    This will display the following figure in your notebook:

Figure 1.5 – Showing the data types of the dataset columns

Figure 1.5 – Showing the data types of the dataset columns

  1. When some samples aren’t inherently a string data type, such as empty data or maybe numbers data, pandas automatically use an Object data type that categorizes the entire column as data types that are unknown to pandas. Let’s check for empty values:
    dataset.isnull().sum()

    This gives us the following output:

Figure 1.6 – Checking empty values

Figure 1.6 – Checking empty values

  1. It looks like the description column has one empty value, as expected. This might be rooted in a mistake when acquiring the data or it might truly be empty. Either way, let’s remove that row as we can’t do anything to recover it and convert the columns into strings:
    dataset.dropna(inplace=True)
    for column in ['category', 'description']:
        dataset[column] = dataset[column].astype("string")
  2. Earlier, we discovered four unique categories. Let’s make sure we have a decent amount of samples for each category by visualizing its distribution:
    sns.countplot(x="category", data=dataset)

    This will result in the following figure:

Figure 1.7 – A graph showing category distribution

Figure 1.7 – A graph showing category distribution

Each category has a good amount of data samples and doesn’t look like there are any anomaly categories.

  1. The goal here is to predict the category of the selling item through the item’s description on the Indian e-commerce website. From that context, we know that Indian citizens speak Hindi, so the dataset might not contain only English data. Let’s try to estimate and verify the available languages in the dataset using an open sourced language detector tool called Lingua. Lingua uses both rule-based and machine learning model-based methods to detect more than 70 text languages that work great for short phrases, single words, and sentences. Because of that, Lingua has a better runtime and memory performance. Let’s start by initializing the language detector instance from the lingua library:
    detector = LanguageDetectorBuilder.from_all_languages(
          ).with_preloaded_language_models().build()
  2. Now, we will randomly sample a small portion of the dataset to detect language as the detection algorithm takes time to complete. Using a 10% fraction of the data should allow us to adequately understand the data:
    sampled_dataset = dataset.sample(frac=0.1, random_state=1234)
    sampled_dataset['language'] = sampled_dataset[
        'description'
    ].progress_apply(lambda x: detector.detect_language_of(x))
  3. Now, let’s visualize the distribution of the language:
    sampled_dataset['language'].value_counts().plot(kind='bar')

    This will show the following graph plot:

Figure 1.8 – Text language distribution

Figure 1.8 – Text language distribution

  1. Interestingly, Lingua detected some anomalous samples that aren’t English. The anomalous languages look like they might be mistakes made by Lingua. Hindi is also detected among them; this is more convincing than the other languages as the data is from an Indian e-commerce website. Let’s check these samples out:
    sampled_dataset[
        sampled_dataset['language'] == Language.HINDI
    ].description.iloc[0]

    This will show the following text:

Figure 1.9 – Visualizing Hindi text

Figure 1.9 – Visualizing Hindi text

  1. It looks like there is a mix of Hindi and English here. How about another language, such as French?
    sampled_dataset[
        sampled_dataset['language'] == Language.FRENCH
    ].description.iloc[0]

    This will show the following text:

Figure 1.10 – Visualizing French text

Figure 1.10 – Visualizing French text

  1. It looks like potpourri was the focused word here as this is a borrowed French word, but the text is still generally English.
  2. Since the list of languages does not include languages that do not use space as a separator between logical word units, let’s attempt to gauge the distribution of words by using a space-based word separation. Word counts and character counts can affect the parameters of a deep learning neural network, so it will be useful to understand these values during EDA:
    dataset['word_count'] = dataset['description'].apply(
        lambda x: len(x.split())
    )
    plt.figure(figsize=(15,4))
    sns.histplot(data=dataset, x="word_count", bins=10)

    This will show the following bar plot:

Figure 1.11 – Word count distribution

Figure 1.11 – Word count distribution

From the exploration and analysis of the text data, we can deduce a couple of reasons that will help set up the model type and structure we should use during the model development stage:

  • The labels are decently sampled with 5,000-11,000 worth of samples per label, making them suitable for deep learning algorithms.
  • The original data is not clean, has missing data, and duplicates but is fixable through manual processing. Using it as-is for model development would have the potential of creating a biased model.
  • The dataset has more than one language but mostly English text; this will allow us to make appropriate model choices during the model development stage.
  • An abundance of samples has fewer than 1,000 words, and some samples have 1,000-8,000 words. In some non-critical use cases, we can safely cap the number of words to around 1,000 words so that we can build a model with better memory and runtime performance.

The preceding practical example should provide a simple experience of performing EDA that will be sufficient to understand the benefit and importance of running an in-depth EDA before going into the model development stage. Similar to the practical text EDA, we prepared a practical EDA sample workflow for other datasets that includes audio, image, and video datasets in our Packt GitHub repository that you should explore to get your hands dirty.

A major concept to grasp in this section is the importance of EDA and the level of curiosity you should display to uncover the truth about your data. Some methods are generalizable to other similar datasets, but treating any specific EDA workflow as a silver bullet blinds you to the increasing research people are contributing to this field. Ask questions about your data whenever you suspect something of it and attempt to uncover the answers yourself by doing manual or automated inspections however possible. Be creative in obtaining these answers and stay hungry in learning new ways you can figure out key information on your data.

In this section, we have methodologically and practically gone through EDA processes for different types of data. Next, we will explore what it takes to prepare the data for actual model consumption.

Data pre-processing

Data pre-processing involves data cleaning, data structuring, and data transforming so that a deep learning model will be capable of using the processed data for model training, evaluation, and inferencing during deployment. The processed data should not only be prepared just for the machine learning model to accept but should generally be processed in a way that optimizes the learning potential and increases the metric performance of the machine learning model.

Data cleaning is a process that aims to increase the quality of the data acquired. An EDA process is a prerequisite to figuring out anything wrong with the dataset before some form of data cleaning can be done. Data cleaning and EDA are often executed iteratively until a satisfactory data quality level is achieved. Cleaning can be as simple as duplicate values removal, empty values removal, or removing values that don’t make logical sense, either in terms of common sense or through business logic. These are concepts that we explained earlier, where the same risks and issues are applied.

Data structuring, on the other hand, is a process that orchestrates the data ingest and loading process from the stored data that is cleaned and verified of its quality. This process determines how data should be loaded from a source or multiple of them and fed into the deep learning model. Sounds simple enough, right? This could be very simple if this is a small, single CSV dataset where there wouldn’t be any performance or memory issues. In reality, this could be very complex in cases where data might be partitioned and stored in different sources due to storage limitations. Here are some concrete factors you’d need to consider in this process:

  • Do you have enough RAM in your computer to process your desired batch size to supply data for your model? Make sure you also take your model size into account so that you won’t get memory overloads and Out of Memory (OOM) errors!
  • Is your data from different sources? Make sure you have permission to access these data sources.
  • Is the speed latency when accessing these sources acceptable? Consider moving this data to a better hardware resource that you can access with higher speeds, such as a solid-state drive (SSD) instead of a hard disk drive (HDD), and from a remote network-accessible source to a direct local hardware source.
  • Do you even have enough local storage to store this data? Make sure you have enough storage to store this data, don’t overload the storage and risk performance slowdowns or worse, computer breakdowns.
  • Optimize the data loading and processing process so that it is fast. Store and cache outputs of data processes that are fixed so that you can save time that can be used to recompute these outputs.
  • Make sure the data structuring process is deterministic, even when there are processes that need randomness. Randomly deterministic is when the randomness can be reproduced in a repeat of the cycle. Determinism helps make sure that the results that have been obtained can be reproduced and make sure model-building methods can be compared fairly and reliably.
  • Log data so that you can debug the process when needed.
  • Data partitioning methods. Make sure a proper cross-validation strategy is chosen that’s suitable for your dataset. If a time-based feature is included, consider whether you should construct a time-based partitioning method where the training data consists of earlier time examples and the evaluation data is in the future. If not, a stratified partitioning method would be your best bet.

Different deep learning frameworks, such as PyTorch and TensorFlow, provide different application programming interfaces (APIs) to implement the data structuring process. Some frameworks provide simpler interfaces that allow for easy setup pipelines while some frameworks provide complex interfaces that allow for a higher level of flexibility. Fortunately, many high-level libraries attempt to simplify the complex interfaces while maintaining flexibility, such as keras on top of TensorFlow, Catalyst on top of PyTorch, fast ai on top of PyTorch, pytorch lightning on top of PyTorch, and ignite on top of PyTorch.

Finally, data transformation is a process that applies unique data variable-specific pre-processing to transform the raw cleaned data into more a representable, usable, and learnable format. An important factor to consider when attempting to execute the data structuring and transformation process is the type of deep learning model you intend to use. Any form of data transformation is often dependent on the deep learning architecture and dependent on the type of inputs it can accept. The most widely known and common deep learning model architectures are invented to tackle specific data types, such as convolutional neural networks for image data, transformer models for sequence-based data, and basic multilayer perceptrons for tabular data. However, deep learning models are considered to be flexible algorithms that can twist and bend to accept data of different forms and sizes, even in multimodal data conditions. Through collaboration with domain experts from the past few years, deep learning experts have been able to build creative forms of deep learning architectures that can handle multiple data modalities and even multiple unstructured data modalities that succeeded in learning cross-modality patterns. Here are two examples:

  • Robust Self-Supervised Audio-Visual Speech Recognition, by Meta AI (formerly Facebook) (https://arxiv.org/pdf/2201.01763v2.pdf):
    • This tackled the problem of speech recognition in the presence of multiple speeches by building a deep learning transformer-based model that can take in both audio and visual data called AV-HuBERT
    • Visual data acted as supplementary data to help the deep learning model discern which speaker to focus on.
    • It achieved the latest state-of-the-art results on the LRS3-TED visual and audio lip reading dataset
  • Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework, by DAMO Academy and Alibaba Group (https://arxiv.org/pdf/2202.03052v1.pdf):
    • They built a model that took in text and image data and published a pre-trained model
    • At achieved state-of-the-art results on an image captioning task on the COCO captions dataset

With that being said, data transformations are mainly differentiated into two parts: feature engineering and data scaling. Deep learning is widely known for its feature engineering capabilities, which replace the need to manually craft custom features from raw data for learning. However, this doesn’t mean that it always makes sense to not perform any feature engineering. Many successful deep learning models have utilized engineered forms of features as input.

Now that we know what data pre-processing entails, let’s discuss and explore different data pre-processing techniques for unstructured data, both theoretically and practically.

Text data pre-processing

Text data can be in different languages and exist in different domains, ranging from description data to informational documents and natural human text comments. Some of the most common text data pre-processing methods that are used for deep learning are as follows:

  • Stemming: A process that removes the suffix of words in an attempt to reduce words into their base form. This promotes the cross-usage of the same features for different forms of the same word.
  • Lemmatization: A process that reduces a word into its base form that produces real English words. Lemmatization has many of the same benefits as stemming but is considered better due to the linguistically valid word reduction outputs it produces.
  • Text tokenization, by Byte Pair Encoding (BPE): Tokenization is a process that splits text into different parts that will be encoded and used by the deep learning models. BPE is a sub-word-based text-splitting algorithm that allows common words to be outputted as a single token but rare words get split into multiple tokens. These split tokens can reuse representations from matching sub-words. This is to reduce the vocabulary that can exist at any one time, reduce the amount of out-of-vocabulary tokens, and allow token representations to be learned more efficiently.

One uncommon pre-processing method that will be useful to build more generalizable text deep learning models is text data augmentation. Text data augmentation can be done in a few ways:

  • Replacing verbs with their synonyms: This can be done by using the set of synonym dictionaries from the NLTK library’s WordNet English lexical database. The obtained augmented text will maintain the same meaning with verb synonym replacement.
  • Back translation: This involves translating text into another language and back to the original language using translation services such as Google or using open sourced translation models. The obtained back-translated text will be in a slightly different form.

Audio data pre-processing

Audio data is essentially sequence-based data and, in some cases, multiple sequences exist. One of the most commonly used pre-processing methods for audio is raw audio data transformed into different forms of spectrograms using Short-Time Fourier Transform (STFT), which is a process that converts audio from the time domain into the frequency domain. A spectrogram audio conversion allows audio data to be broken down and represented in a range of frequencies instead of a single waveform representation that is a combination of the signals from all audio frequencies. These spectrograms are two-dimensional data and thus can be treated as an image and fed into convolutional neural networks. Data scaling methods such as log scaling and log-mel scaling are also commonly applied to these spectrograms to further emphasize frequency characteristics.

Image data pre-processing

Image data augmentation is a type of image-based feature engineering technique that is capable of increasing the comprehensiveness potential of the original data. A best practice for applying this technique is to structure the data pipeline to apply image augmentations randomly during the training process by batch instead of providing a fixed augmented set of data for the deep learning model. Choosing the type of image augmentation requires some understanding of the business requirements of the use case. Here are some examples where it doesn’t make sense to apply certain augmentations:

  • When the orientation of the image affects the validity of the target label, orientation modification types of augmentation such as rotation and image flipping wouldn’t be suitable
  • When the color of the image affects the validity of the target label, color modification types of augmentation such as grayscale, channel shuffle, hue saturation shift, and RGB shift aren’t suitable

After excluding obvious augmentations that won’t be suitable, a common but effective method to figure out the best set of augmentations list is iterative experiments and model comparisons.

Developing deep learning models

Let’s start with a short recap of what deep learning is. Deep learning’s core foundational building block is a neural network. A neural network is an algorithm that was made to simulate the human brain. Its building blocks are called neurons, which mimic the billions of neurons the human brain contains. Neurons, in the context of neural networks, are objects that store simple information called weights and biases. Think of these as the memory of the algorithm.

Deep learning architectures are essentially neural network architectures that have three or more neural network layers. Neural network layers can be categorized into three high-level groups – the input layer, the hidden layer, and the output layer. The input layer is the simplest layer group and whose functionality is to pass the input data to subsequent layers. This layer group does not contain biases and can be considered passive neurons, but the group still contains weights in its connections to neurons from subsequent layers. The hidden layer comprises neurons that contain biases and weights in their connections to neurons from subsequent layers. Finally, the output layer comprises neurons that relate to the number of classes and problem types and contains bias. A best practice when counting neural network layers is to exclude the input layer when doing so. So, a neural network with one input layer, one hidden layer, and one output layer is considered to be a two-layer neural network. The following figure shows a basic neural network, called a multilayer perceptron (MLP), with a single input layer, a single hidden layer, and a single output layer:

Figure 1.12 – A simple deep learning architecture, also called an MLP

Figure 1.12 – A simple deep learning architecture, also called an MLP

Being a subset of the wider machine learning category, deep learning models are capable of learning patterns from the data through a loss function and an optimizer algorithm that optimizes the loss function. A loss function defines the error made by the model so that its memory (weights and biases) can be updated to perform better in the next iteration. An optimizer algorithm is an algorithm that decides the strategy to update the weights given the loss value.

With this short recap, let’s dive into a summary of the common deep learning model families.

Deep learning model families

These layers can come in many forms as researchers have been able to invent new layer definitions to tackle new problem types and almost always comes with a non-linear activation function that allows the model to capture non-linear relationships between the data. Along with the variation of layers come many different deep learning architecture families that are meant for different problem types. A few of the most common and widely used deep learning models are as follows:

  • MLP for tabular data types. This will be explored in Chapter 2, Designing Deep Learning Architectures.
  • Convolutional neural network for image data types. This will be explored in Chapter 3, Understanding Convolutional Neural Networks.
  • Autoencoders for anomaly detection, data compression, data denoising, and feature representation learning. This will be explored in Chapter 5, Understanding Autoencoders.
  • Gated recurrent unit (GRU), Long Short-Term Memory (LSTM), and Transformers for sequence data types. These will be explored in Chapter 4, Understanding Recurrent Neural Networks, and Chapter 6, Understanding Neural Network Transformers, respectively.

These architectures will be the focus of Chapters 2 to 6, where we will discuss their methodology and go through some practical evaluation. Next, let’s discover the problem types we can tackle in deep learning.

The model development strategy

Today, deep learning models are easy to invent and create due to the advent of deep learning frameworks such as PyTorch and TensorFlow, along with their high-level library wrappers. Which framework you should choose at this point is a matter of preference regarding their interfaces as both frameworks are matured with years of improvement work done. Only when there is a pressing need for a very custom function to tackle a unique problem type will you need to choose the framework that can execute what you need. Once you’ve chosen your deep learning framework, the deep model creation, training, and evaluation process is pretty much covered all around.

However, model management functions do not come out of the box from these frameworks. Model management is an area of technology that allows teams, businesses, and deep learning practitioners to reliably, quickly, and effectively build models, evaluate models, deliver model insights, deploy models to production, and govern models. Model management can sometimes be referred to as machine learning operations (MLOps). You might still be wondering why you’d need such functionalities, especially if you’ve been building some deep learning models off Kaggle, a platform that hosts data and machine learning problems as competitions. So, here are some factors that drive the need to utilize these functionalities:

  • It is cumbersome to compare models manually:
    • Manually typing performance data in an Excel sheet to keep track of model performance is slow and unreliable
  • Model artifacts are hard to keep track of:
    • A model has many artifacts, such as its trained weights, performance graphs, feature importance, and prediction explanations
    • It is also cumbersome to compare model artifacts
  • Model versioning is needed to make sure model-building experiments are not repeated:
    • Overriding the top-performing model with the most reliable model insights is the last thing you want to experience
    • Versioning should depend on the data partitioning method, model settings, and software library versions
  • It is not straightforward to deploy and govern models

Depending on the size of the team involved in the project and how often components need to be reused, different software and libraries would fit the bill. These software and libraries are split into paid and free (usually open sourced) categories. Metaflow, an open sourced software, is suitable for bigger data science teams where there are many chances of components needing to be reused across other projects and MLFlow (open sourced software) would be more suitable for small or single-person teams. Other notable model management tools are Comet (paid), Weights & Biases (paid), Neptune (paid), and Algorithmia (paid).

With that, we have provided a brief overview of deep learning model development methodology and strategy; we will dive deeper into model development topics in the next few chapters. But before that, let’s continue with an overview of the topic of delivering model insights.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Interpret your models’ decision-making process, ensuring transparency and trust in your AI-powered solutions
  • Gain hands-on experience in every step of the deep learning life cycle
  • Explore case studies and solutions for deploying DL models while addressing scalability, data drift, and ethical considerations
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Deep learning enables previously unattainable feats in automation, but extracting real-world business value from it is a daunting task. This book will teach you how to build complex deep learning models and gain intuition for structuring your data to accomplish your deep learning objectives. This deep learning book explores every aspect of the deep learning life cycle, from planning and data preparation to model deployment and governance, using real-world scenarios that will take you through creating, deploying, and managing advanced solutions. You’ll also learn how to work with image, audio, text, and video data using deep learning architectures, as well as optimize and evaluate your deep learning models objectively to address issues such as bias, fairness, adversarial attacks, and model transparency. As you progress, you’ll harness the power of AI platforms to streamline the deep learning life cycle and leverage Python libraries and frameworks such as PyTorch, ONNX, Catalyst, MLFlow, Captum, Nvidia Triton, Prometheus, and Grafana to execute efficient deep learning architectures, optimize model performance, and streamline the deployment processes. You’ll also discover the transformative potential of large language models (LLMs) for a wide array of applications. By the end of this book, you'll have mastered deep learning techniques to unlock its full potential for your endeavors.

Who is this book for?

This book is for deep learning practitioners, data scientists, and machine learning developers who want to explore deep learning architectures to solve complex business problems. Professionals in the broader deep learning and AI space will also benefit from the insights provided, applicable across a variety of business use cases. Working knowledge of Python programming and a basic understanding of deep learning techniques is needed to get started with this book.

What you will learn

  • Use neural architecture search (NAS) to automate the design of artificial neural networks (ANNs)
  • Implement recurrent neural networks (RNNs), convolutional neural networks (CNNs), BERT, transformers, and more to build your model
  • Deal with multi-modal data drift in a production environment
  • Evaluate the quality and bias of your models
  • Explore techniques to protect your model from adversarial attacks
  • Get to grips with deploying a model with DataRobot AutoML
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 29, 2023
Length: 516 pages
Edition : 1st
Language : English
ISBN-13 : 9781803243795
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Colour book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Dec 29, 2023
Length: 516 pages
Edition : 1st
Language : English
ISBN-13 : 9781803243795
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 142.97
Causal Inference and Discovery in Python
$39.99
Python Deep Learning
$49.99
The Deep Learning Architect's Handbook
$52.99
Total $ 142.97 Stars icon

Table of Contents

24 Chapters
Part 1 – Foundational Methods Chevron down icon Chevron up icon
Chapter 1: Deep Learning Life Cycle Chevron down icon Chevron up icon
Chapter 2: Designing Deep Learning Architectures Chevron down icon Chevron up icon
Chapter 3: Understanding Convolutional Neural Networks Chevron down icon Chevron up icon
Chapter 4: Understanding Recurrent Neural Networks Chevron down icon Chevron up icon
Chapter 5: Understanding Autoencoders Chevron down icon Chevron up icon
Chapter 6: Understanding Neural Network Transformers Chevron down icon Chevron up icon
Chapter 7: Deep Neural Architecture Search Chevron down icon Chevron up icon
Chapter 8: Exploring Supervised Deep Learning Chevron down icon Chevron up icon
Chapter 9: Exploring Unsupervised Deep Learning Chevron down icon Chevron up icon
Part 2 – Multimodal Model Insights Chevron down icon Chevron up icon
Chapter 10: Exploring Model Evaluation Methods Chevron down icon Chevron up icon
Chapter 11: Explaining Neural Network Predictions Chevron down icon Chevron up icon
Chapter 12: Interpreting Neural Networks Chevron down icon Chevron up icon
Chapter 13: Exploring Bias and Fairness Chevron down icon Chevron up icon
Chapter 14: Analyzing Adversarial Performance Chevron down icon Chevron up icon
Part 3 – DLOps Chevron down icon Chevron up icon
Chapter 15: Deploying Deep Learning Models to Production Chevron down icon Chevron up icon
Chapter 16: Governing Deep Learning Models Chevron down icon Chevron up icon
Chapter 17: Managing Drift Effectively in a Dynamic Environment Chevron down icon Chevron up icon
Chapter 18: Exploring the DataRobot AI Platform Chevron down icon Chevron up icon
Chapter 19: Architecting LLM Solutions Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(10 Ratings)
5 star 80%
4 star 20%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Pat Mthisi Feb 15, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The books are of top quality and easy to use. They contain practical examples, and I would recommend them to anyone wanting to dive into AI.
Feefo Verified review Feefo
Steven Fernandes Mar 02, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book delves into advanced AI techniques, emphasizing the automation of artificial neural network design via neural architecture search (NAS). It guides on implementing various models like RNNs, CNNs, BERT, and transformers, and tackles challenges such as data drift, model evaluation for quality and bias, and defense against adversarial attacks. Additionally, it covers model deployment using DataRobot AutoML, making it a practical resource for mastering contemporary machine learning implementations.
Amazon Verified review Amazon
Amazon Customer Feb 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book delves into the complexities of deep learning, from planning to deployment, using real-world examples to illustrate the creation, deployment, and management of advanced solutions. Readers will explore working with various data types using deep learning architectures, optimizing model performance, and evaluating models objectively to tackle issues such as bias, fairness, and model transparency.By leveraging Python libraries like PyTorch readers can streamline the deep learning process, optimize model performance, and simplify deployment. The book also highlights the transformative potential of large language models for diverse applications.For deep learning practitioners, data scientists, and machine learning developers seeking to solve complex business challenges, this book is a must-read. It equips readers with the knowledge to harness the full potential of deep learning techniques, making it a valuable asset for anyone in the AI space.Deep Learning Architect Handbook is invaluable resource that guides readers through the intricate world of deep learning, empowering them to enhance productivity and efficiency. This practical guide encompasses the entire deep learning life cycle, offering techniques and best practices crucial for success in the realm of AI.
Amazon Verified review Amazon
H2N Feb 13, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a great deep learning book for data scientists and machine learning engineers. The book introduces how to solve complex business issues with deep learning techniques. The author discusses the fundamentals of the deep learning to advanced topics like CNNs, RNNs, Autoencoders, and Transformers using Python with neural architecture design, evaluation, bias and fairness, and deploying models, offering practical insights for leveraging AI platforms like DataRobot.
Amazon Verified review Amazon
Didi Feb 19, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Deep learning (DL)—a subset of machine learning that utilizes deep neural networks—has taken the world by storm and made numerous significant breakthroughs throughout the last decade. These breakthroughs have revolutionized entire fields such as computer vision and natural language processing. This comprehensive book is a modern guide to DL model building and deployment, and serves as a unique and practical resource for understanding modern DL architectures, model training, and real-world deployment from the ground up.The book begins with a clear and detailed overview of foundational DL architectures, such as convolutional neural networks (NNs), recurrent NNs, autoencoders, and the Transformer architecture. The second part of the book focuses on interpreting and extracting insights from DL models, and covers model evaluation techniques, interpreting model predictions, exploring bias and fairness, and analyzing adversarial performance. The last part of the book is focused on various practical aspects of real-world model deployment (aka DLOps), including deployment in production environments, governance, drift management, and even the architecture of LLMs (large language models). The helpful code examples and diagrams that accompany the textual descriptions greatly assist in reinforcing the materials and concepts presented in the book. The accompanying GitHub repository includes all code examples, and is very useful as well.This practical guide will benefit any DL practitioner, researcher, data scientist or machine learning practitioner who wants to better understand how to build and deploy real-world DL models. Prior familiarity with DL and Python will be very helpful to fully benefit from this book.Highly recommended!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela