Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Machine Learning for Data Mining
Machine Learning for Data Mining

Machine Learning for Data Mining: Improve your data mining capabilities with advanced predictive modeling

By Jesus Salcedo
Free Trial
Book Apr 2019 252 pages 1st Edition
eBook
NZ$‎38.99
Print
NZ$‎48.99
Subscription
Free Trial
eBook
NZ$‎38.99
Print
NZ$‎48.99
Subscription
Free Trial

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning for Data Mining

Introducing Machine Learning Predictive Models

A large percentage of data mining opportunities involve machine learning, and these opportunities often come with greater financial rewards. This chapter will give you the basic knowledge that you need to bring the power of machine learning into your data mining work. In this chapter, we're going to talk about the characteristics of machine learning models and also see some examples of these models.

The following are the topics that we will be covering in this chapter:

  • Characteristics of machine learning predictive models
  • Types of machine learning predictive models
  • Working with neural networks
  • A sample neural network model

Characteristics of machine learning predictive models

Knowing the characteristics of machine learning predictive models will help you understand the advantages and limitations in comparison to any statistical or decision tree models.

Let's get some insights on a few characteristics of predictive models in machine learning:

  • Optimized to learn complex patterns: Machine learning models are designed to be optimized to learn complex patterns. In comparison to statistical models or decision tree models, predictive models greatly excel, when you have very complex patterns in data.
  • Account for interactions and nonlinear relationships: Machine learning predictive models can account for interactions in the data and nonlinear relationships to an even better degree than decision tree models.
  • Few assumptions: These models are powerful because they have very few assumptions. They can also be used with different types of data.
  • A black box model's interpretation is not straightforward: Predictive models are black box models, this is one of the drawbacks of predictive machine learning models, because this implies that the interpretation is not straightforward. This means that, if we end up building many different equations and combine them, it becomes very difficult to see exactly how each one of these variables ended up interacting and impacting an output variable. So, the predictive machine learning models are great when it comes to predictive accuracy, but they're not that good for understanding the mechanics behind a prediction.

If you want to predict something, these models do a pretty good job and have amazing accuracy. But if you want to know why something is being predicted, and if you are looking forward to making some changes in the implementation so that you don't get a particular prediction, then it would be difficult to decipher.

Types of machine learning predictive models

The following are some of the different types of machine learning predictive models:

  • Neural networks
  • Support Vector Machines
  • Random forest
  • Naive Bayesian algorithms
  • Gradient boosting algorithms
  • K-nearest neighbors
  • Self-learning response model

We won't be covering all of them, but we'll focus on a very interesting model – the neural network. In the following sections, we will get an in-depth view of what neural networks are.

Working with neural networks

Neural networks were initially developed in an attempt to understand how the brain operates. They were originally used in the areas of neuroscience and linguistics.

In these fields, researchers noticed that something happened in the environment (input), the individual processed the information (in the brain), and then reacted in some way (output).

So, the idea behind neural networks or neural nets is that they will serve as a brain, which is like a black box. We then have to try to figure out what is going on so that the findings can be applied.

Advantages of neural networks

The following are the advantages of using a neural network:

  • Good for many types of problems: They work well with most of the complex problems that you might come across.
  • They generalize very well: Accurate generalization is a very important feature.
  • They are very common: Neural networks have become very common in today's world, and they are readily accepted and implemented for real-world problems.
  • A lot is known about them: Owing to the popularity that neural networks have gained, there is a lot of research being done and implemented successfully in different areas, so there is a lot of information available on neural networks.
  • Works well with non-clustered data: When you have non-clustered data, neural networks can be used in several situations, such as where the data itself is very complex, where you have many interactions, or where you have nonlinear relationships; neural networks are certainly very powerful and very robust solutions for such situations.

Disadvantages of neural networks

Good models come at the cost of a few disadvantages:

  • They take time to train: Neural networks do take a long time to train; they are generally slower than a linear regression model or a decision tree model, as these basically just do one pass on the data, while, with neural networks, you actually go through many, many iterations.
  • The best solution is not guaranteed: You're not guaranteed to find the best solution. This also means that, in addition to running a single neural network through many iterations, you'll also need to run it multiple times using different starting points so that you can try to get closer to the best solution.
  • Black boxes: As we discussed earlier, it is hard to decipher what gave a certain output and how.

Representing the errors

While building our neural network, our actual goal is to build the best possible solution, and not to get stuck with a sub-optimal one. We'll need to run a neural network multiple times.

Consider this error graph as an example:

This is a graph depicting the amount of errors in different solutions. The Global Solution is the best possible solution and is really optimal. A Sub-Optimal Solution is a solution that terminates, gets stuck, and no longer improves, but it isn't really the best solution.

Types of neural network models

There are different types of neural networks available for us; in this section, we will gain insights into these.

Multi-layer perceptron

The most common type is called the multi-layer perceptron model. This neural network model consists of neurons represented by circles, as shown in the following diagram. These neurons are organized into layers:

Every multi-layer perceptron model will have at least three layers:

  • Input Layer: This layer consists of all the predictors in our data.
  • Output Layer: This will consist of the outcome variable, which is also known as the dependent variable or target variable.
  • Hidden Layer: This layer is where you maximize the power of a neural network. Non-linear relationships can also be created in this layer, and all the complex interactions are carried out here. You can have many such hidden layers.

You will also notice in the preceding diagram that every neuron in a layer is connected to every neuron in the next layer. This forms connections, and every connecting line will have a weight associated with it. These weights will form different equations in the model.

Why are weights important?

Weights are important for several reasons. First because all neurons in one layer are connected to every neuron in the next layer, this means that the layers are connected. It also means that a neural network model, unlike many other models, doesn't drop any predictors. So for example, you may start off with 20 predictors, and these 20 predictors will be kept. A second reason why weights are important is that they provide information on the impact or importance of each predictor to the prediction. As will be shown later, these weights start off randomly, however through multiple iterations, the weights are modified so as to provide meaningful information.

An example representation of a multilayer perceptron model

Here, we will look at an example of a multilayer perceptron model. We will try to predict a potential buyer of a particular item based on an individual's age, income, and gender.

Consider the following, for example:

As you can see, our input predictors that form the Input Layer are age, income, and gender. The outcome variable that forms our Output Layer is Buy, which will determine whether someone bought a product or not. There is a hidden layer where the input predictors end up combining.

To better understand what goes on behind the scenes of a neural network model, lets take a look at a linear regression model.

The linear regression model

Let's understand the linear regression model with the help of an example.

Consider the following:

In linear regression, every input predictor in the Input Layer is connected to the outcome field by a single connection weight, also known as the coefficient, and these coefficients are estimated by a single pass through the data. The number of coefficients will be equal to the number of predictors. This means that every predictor will have a coefficient associated with it.

Every input predictor is directly connected to the Target with a particular coefficient as its weight. So, we can easily see the impact of a one unit change in the input predictor on the outcome variable or the Target. These kind of connections make it easy to determine the effect of each predictor on the Target variable as well as on the equation.

A sample neural network model

Let's use an example to understand neural networks in more detail:

Notice that every neuron in the Input Layer is connected to every neuron in the Hidden Layer, for example, Input 1 is connected to the first, second, and even the third neuron in the Hidden Layer. This implies that there will be three different weights, and these weights will be a part of three different equations.

This is what happens in this example:

  • The Hidden Layer intervenes between the Input Layer and the Output Layer.
  • The Hidden Layer allows for more complex models with nonlinear relationships.
  • There are many equations, so the influence of a single predictor on the outcome variable occurs through a variety of paths.
  • The interpretation of weights won't be straightforward.
  • Weights correspond to the variable importance; they will initially be random, and then they will go through a bunch of different iterations and will be changed based on the feedback of the iterations. They will then have their real meaning of being associated with variable importance.

So, let's go ahead and see how these weights are determined and how we can form a functional neural network.

Feed-forward backpropagation

Feed-forward backpropagation is a method through which we can predict things such as weights, and ultimately the outcome of a neural network.

According to this method, the following iterations occur on predictions:

  • If a prediction is correct, the weight associated with it is strengthened. Imagine the neural network saying, Hey, you know what, we used the weight of 0.75 for the first part of this equation for the first predictor and we got the correct prediction; that's probably a good starting point.
  • Suppose the prediction is incorrect; the error is fed back or back propagated into the model so that the weights or weight coefficients are modified, as shown here:

This backpropagation won't just take place in-between the Hidden Layers and the Target layer, but will also take place toward the Input Layer:

While these iterations are happening, we are actually making our neural network better and better with every error propagation. The connections now make a neural network capable of learning different patterns in the data.

So, unlike any linear regression or a decision tree model, a neural network tries to learn patterns in the data. If it's given enough time to learn those patterns, the neural network, combined with its experience, understands and predicts better, improving the rate of accuracy to a great extent.

Model training ethics

When you are training the neural network model, never train the model with the whole dataset. We need to hold back some data for testing purposes. This will allow us to test whether the neural network is able to apply what its learned from the training dataset to a new data.

We want the neural network to generalize well to new data and capture the generalities of different types of data, not just little nuances that would then make it sample-specific. Instead, we want the results to be translated to the new data as well. After the model has been trained, the new data can be predicted using the model's experience.

Summary

I hope you are now clear on machine learning predictive models and have understood the basic concepts. In this chapter, we have seen the characteristics of machine learning predictive models and have learned about some of the different types. These concepts are stepping stones to further chapters. We have also looked at an example of a basic neural network model. In the next chapter, we will implement a live neural network on a dataset and you will also be introduced to support vector machines and their implementation.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn how to apply machine learning techniques in the field of data science
  • Understand when to use different data mining techniques, how to set up different analyses, and how to interpret the results
  • A step-by-step approach to improving model development and performance

Description

Machine learning (ML) combined with data mining can give you amazing results in your data mining work by empowering you with several ways to look at data. This book will help you improve your data mining techniques by using smart modeling techniques. This book will teach you how to implement ML algorithms and techniques in your data mining work. It will enable you to pair the best algorithms with the right tools and processes. You will learn how to identify patterns and make predictions with minimal human intervention. You will build different types of ML models, such as the neural network, the Support Vector Machines (SVMs), and the Decision tree. You will see how all of these models works and what kind of data in the dataset they are suited for. You will learn how to combine the results of different models in order to improve accuracy. Topics such as removing noise and handling errors will give you an added edge in model building and optimization. By the end of this book, you will be able to build predictive models and extract information of interest from the dataset

What you will learn

Hone your model-building skills and create the most accurate models Understand how predictive machine learning models work Prepare your data to acquire the best possible results Combine models in order to suit the requirements of different types of data Analyze single and multiple models and understand their combined results Derive worthwhile insights from your data using histograms and graphs

Product Details

Country selected

Publication date : Apr 30, 2019
Length 252 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781838828974
Category :

What do you get with a Packt Subscription?

Free for first 7 days. $15.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details


Publication date : Apr 30, 2019
Length 252 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781838828974
Category :

Table of Contents

7 Chapters
Preface Chevron down icon Chevron up icon
1. Introducing Machine Learning Predictive Models Chevron down icon Chevron up icon
2. Getting Started with Machine Learning Chevron down icon Chevron up icon
3. Understanding Models Chevron down icon Chevron up icon
4. Improving Individual Models Chevron down icon Chevron up icon
5. Advanced Ways of Improving Models Chevron down icon Chevron up icon
6. Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.