Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Neural Networks

You're reading from  Hands-On Neural Networks

Product type Book
Published in May 2019
Publisher Packt
ISBN-13 9781788992596
Pages 280 pages
Edition 1st Edition
Languages
Authors (2):
Leonardo De Marchi Leonardo De Marchi
Profile icon Leonardo De Marchi
Laura Mitchell Laura Mitchell
Profile icon Laura Mitchell
View More author details
Toc

Table of Contents (16) Chapters close

Preface 1. Section 1: Getting Started
2. Getting Started with Supervised Learning 3. Neural Network Fundamentals 4. Section 2: Deep Learning Applications
5. Convolutional Neural Networks for Image Processing 6. Exploiting Text Embedding 7. Working with RNNs 8. Reusing Neural Networks with Transfer Learning 9. Section 3: Advanced Applications
10. Working with Generative Algorithms 11. Implementing Autoencoders 12. Deep Belief Networks 13. Reinforcement Learning 14. Whats Next? 15. Other Books You May Enjoy

Supervised learning in practice with Python

As we said earlier, supervised learning algorithms learn to approximate a function by mapping inputs and outputs to create a model that is able to predict future outputs given unseen inputs.

It's conventional to denote inputs as x and outputs as y; both can be numerical or categorical.

We can distinguish them as two different types of supervised learning:

  • Classification
  • Regression

Classification is a task where the output variable can assume a finite amount of elements, called categories. An example of classification would be classifying different types of flowers (output) given the sepal length (input). Classification can be further categorized in more sub types:

  • Binary classification: The task of predicting whether an instance belongs either to one class or the other
  • Multiclass classification: The task (also known as multinomial) of predicting the most probable label (class) for each single instance
  • Multilabel classification: When multiple labels can be assigned to each input

Regression is a task where the output variable is continuous. Here are some common regression algorithms:

  • Linear regression: This finds linear relationships between inputs and outputs
  • Logistic regression: This finds the probability of a binary output

In general, the supervised learning problem is solved in a standard way by performing the following steps:

  1. Performing data cleaning to make sure the data we are using is as accurate and descriptive as possible.
  2. Executing the feature engineering process, which involves the creation of new features out of the existing ones for improving the algorithm's performance.
  3. Transforming input data into something that our algorithm can understand, which is known as data transformation. Some algorithms, such as neural networks, don't work well with data that is not scaled as they would naturally give more importance to inputs with a larger magnitude.
  4. Choosing an appropriate model (or a few of them) for the problem.
  5. Choosing an appropriate metric to measure the effectiveness of our algorithm.
  6. Train the model using a subset of the available data, called the training set. On this training set, we calibrate the data transformations.
  7. Testing the model.

Data cleaning

Data cleaning is a fundamental process to make sure we are able to produce good results at the end. It is task-specific, as in the cleaning you will have to perform on audio data will be different for images, text, or a time series data.

We will need to make sure there is no missing data, and if that's the case we can decide how to deal with it. In the case of missing data—for example, an instance missing a few variables, it's possible to fill them with the average for that variable, fill it with a value that the input cannot assume, such as -1 if the variable is between 0 and 1 or disregard the instance if we have a lot of data.

Also, it's good to check whether the data respects the limitations of the values we are measuring. For example, a temperature in Celsius cannot be lower than 273.15 degrees, if that's the case, we know straight away that the data point is unreliable.

Other checks include the format, the data types, and the variance in the dataset.

It's possible to load some clean data directly from scikit-learn. There are a lot of datasets for all sort of tasks—for example, if we want to load some image data, we can use the following Python code:

from sklearn.datasets import fetch_lfw_people
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

This data is known as Labeled Faces in the Wild, a dataset for face recognition.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime