Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Hands-On Meta Learning with Python
Hands-On Meta Learning with Python

Hands-On Meta Learning with Python: Meta learning using one-shot learning, MAML, Reptile, and Meta-SGD with TensorFlow

By Sudharsan Ravichandiran
NZ$‎64.99
Book Dec 2018 226 pages 1st Edition
eBook
NZ$‎51.99
Print
NZ$‎64.99
Subscription
Free Trial
eBook
NZ$‎51.99
Print
NZ$‎64.99
Subscription
Free Trial

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Hands-On Meta Learning with Python

Chapter 1. Introduction to Meta Learning

Meta learning is one of the most promising and trending research areas in the field of artificial intelligence right now. It is believed to be a stepping stone for attaining Artificial General Intelligence (AGI). In this chapter, we will learn about what meta learning is and why meta learning is the most exhilarating research in artificial intelligence right now. We will understand what is few-shot, one-shot, and zero-shot learning and how it is used in meta learning. We will also learn about different types of meta learning techniques. We will then explore the concept of learning to learn gradient descent by gradient descent where we understand how we can learn the gradient descent optimization using the meta learner. Going ahead, we will also learn about optimization as a model for few-shot learning where we will see how we can use meta learner as an optimization algorithm in the few-shot learning setting.

In this chapter, you will learn about the following:

  • Meta learning
  • Meta learning and few-shot
  • Types of meta learning
  • Learning to learn gradient descent by gradient descent
  • Optimization as a model for few-shot learning

Meta learning


Meta learning is an exhilarating research domain in the field of AI right now. With plenty of research papers and advancements, meta learning is clearly making a major breakthrough in AI. Before getting into meta learning, let's see how our current AI model works.

Deep learning has progressed rapidly in recent years with great algorithms such as generative adversarial networks and capsule networks. But the problem with deep neural networks is that we need to have a large training set to train our model and it will fail abruptly when we have very few data points. Let's say we trained a deep learning model to perform task A. Now, when we have a new task, B, that is closely related to A, we can't use the same model. We need to train the model from scratch for task B. So, for each task, we need to train the model from scratch although they might be related.

Is deep learning really the true AI? Well, it is not. How do we humans learn? We generalize our learning to multiple concepts and learn from there. But current learning algorithms master only one task. Here is where meta learning comes in. Meta learning produces a versatile AI model that can learn to perform various tasks without having to train them from scratch. We train our meta learning model on various related tasks with few data points, so for a new related task, it can make use of the learning obtained from the previous tasks and we don't have to train them from scratch. Many researchers and scientists believe that meta learning can get us closer to achieving AGI. We will learn exactly how meta learning models learn the learning process in the upcoming sections.

Meta learning and few-shot

Learning from fewer data points is called few-shot learning or k-shot learning where k denotes the number of data points in each of the classes in the dataset. Let's say we are performing the image classification of dogs and cats. If we have exactly one dog and one cat image then it is called one-shot learning, that is, we are learning from just one data point per class. If we have, say 10 images of a dog and 10 images of a cat, then that is called 10-shot learning. So k in k-shot learning implies a number of data points we have per class. There is also zero-shot learning where we don't have any data points per class. Wait. What? How can we learn when there are no data points at all? In this case, we will not have data points, but we will have meta information about each of the classes and we will learn from the meta information. Since we have two classes in our dataset, that is, dog and cat, we can call it two-way k-shot learning; so n-way means the number of classes we have in our dataset.

In order to make our model learn from a few data points, we will train them in the same way. So, when we have a dataset, D, we sample a few data points from each of the classes present in our data set and we call it as support set. Similarly, we sample some different data points from each of the classes and call it as query set. So we train our model with a support set and test with a query set. We train our model in an episodic fashion—that is, in each episode, we sample a few data points from our dataset, D, prepare our support set and query set, and train on the support set and test on the query set. So, over series of episodes, our model will learn how to learn from a smaller dataset. We will explore this in more detail in the upcoming chapters.

Types of meta learning


Meta learning can be categorized in several ways, right from finding the optimal sets of weights to learning the optimizer. We will categorize meta learning into the following three categories:

  • Learning the metric space
  • Learning the initializations
  • Learning the optimizer

Learning the metric space

In the metric-based meta learning setting, we will learn the appropriate metric space. Let's say we want to learn the similarity between two images. In the metric-based setting, we use a simple neural network that extracts the features from two images and finds the similarity by computing the distance between features of these two images. This approach is widely used in a few-shot learning setting where we don't have many data points. In the upcoming chapters, we will learn about metric-based learning algorithms such as Siamese networks, prototypical networks, and relation networks.

Learning the initializations

In this method, we try to learn optimal initial parameter values. What do we mean by that? Let's say we are a building a neural network to classify images. First, we initialize random weights, calculate loss, and minimize the loss through a gradient descent. So, we will find the optimal weights through gradient descent and minimize the loss. Instead of initializing the weights randomly, if can we initialize the weights with optimal values or close to optimal values, then we can attain the convergence faster and we can learn very quickly. We will see how exactly we can find these optimal initial weights with algorithms such as MAML, Reptile, and Meta-SGD in the upcoming chapters.

Learning the optimizer

In this method, we try to learn the optimizer. How do we generally optimize our neural network? We optimize our neural network by training on a large dataset and minimize the loss using gradient descent. But in the few-shot learning setting, gradient descent fails as we will have a smaller dataset. So, in this case, we will learn the optimizer itself. We will have two networks: a base network that actually tries to learn and a meta network that optimizes the base network. We will explore how exactly this works in the upcoming sections.

Learning to learn gradient descent by gradient descent


Now, we will see one of the interesting meta learning algorithms called learning to learn gradient descent by gradient descent. Isn't the name kind of daunting? Well, in fact, it is one of the simplest meta learning algorithms. We know that, in meta learning, our goal is to learn the learning process. In general, how do we train our neural networks? We train our network by computing loss and minimizing the loss through gradient descent. So, we optimize our model using gradient descent. Instead of using gradient descent can we learn this optimization process automatically?

But how can we learn this? We replace our traditional optimizer (gradient descent) with the Recurrent Neural Network (RNN). But how does this work? How can we replace gradient descent with RNN? If you examine closely, what are we really doing in gradient descent? It is basically a sequence of updates from the output layer to the input layer and we store these updates in a state. So, we can use RNN and store the updates in an RNN cell.

So, the main idea of this algorithm is to replace gradient descent with RNN. But the question is how do RNNs learn? How can we optimize the RNN? For optimizing an RNN, we use gradient descent. So, in a nutshell, we are learning to perform gradient descent through an RNN and that RNN is optimized by gradient descent and that's what is meant by the name learning to learn gradient descent by gradient descent.

We call our RNN, an optimizer and our base network, an optimizee. Let's say we have a model

parameterized by some parameter

. We need to find this optimal parameter

, so that we can minimize the loss. In general, we find this optimal parameter through gradient descent, but now we use the RNN for finding this optimal parameter. So the RNN (optimizer) finds the optimal parameter and sends it to the optimizee (base network); the optimizee uses this parameter, computes the loss, and sends the loss to the RNN. Based on the loss, the RNN optimizes itself through gradient descent and updates the model parameter

.

Confusing? Look at the following diagram: our optimizee (base network) is optimized through our optimizer (RNN). The optimizer sends the updated parameters—that is, weights—to the optimizee and the optimizee uses these weights, calculates the loss, and sends the loss to the optimizer; based on the loss, the optimizer improves itself through gradient descent:

Let's say our base network (optimizee) is parameterized by

and our RNN (optimizer) is parameterized by

. What is the loss function of the optimizer? We know that the optimizer's role (RNN) is to reduce the loss of the optimizee (base network). So the loss of our optimizer is the average loss of the optimizee and it can be represented as follows:

How do we minimize this loss? We minimize this loss through gradient descent by finding the right

. Okay, what does the RNN take as input and what output would it return? Our optimizer, that is, our RNN, takes as input the gradient of optimizee

as well as its previous state

and returns output, an update

that can minimize the loss of our optimizee. Let's denote our RNN by a function

:

In the previous equation, the following applies:

  • is the gradient of our model (optimizee)
    , that is,
  • is the hidden state of the RNN
  • is the parameter for the RNN
  • Outputs
    and
    is the update and next state of the RNN respectively

So, we update our model parameter values using

.

As you can see in the following diagram, our optimizer

at a time t, takes in a hidden state

and a gradient of

as

as inputs, computes

and sends it to our optimizee, where it is added with

 and becomes

for an update at the next time step:

So, in this way, we learn the gradient descent optimization through gradient descent.

Optimization as a model for few-shot learning


We know that, in few-shot learning, we learn from lesser data points, but how can we apply gradient descent in a few-shot learning setting? In a few-shot learning setting, gradient descent fails abruptly due to very few data points. Gradient descent optimization requires more data points to reach the convergence and minimize loss. So, we need a better optimization technique in the few-shot regime. Let's say we have a

 model parameterized by some parameter

. We initialize this parameter

with some random values and try to find the optimal value using gradient descent. Let's recall the update equation of our gradient descent:

In the previous equation, the following applies:

  • is the updated parameter
  • is the parameter value at previous time step
  • is the learning rate
  • is the gradient of loss function with respect to

Doesn't the update equation of gradient descent look familiar? Yes, you guessed it right: it resembles the cell state update equation of LSTM and it can be written as follows:

We can totally relate our LSTM cell update equation with gradient descent as, let's say

= 1, then the following applies:

So, instead of using gradient descent as an optimizer in the few-shot learning regime, we can use LSTM as an optimizer. Our meta learner is the LSTM, which learns the update rule for training our model. So we use two networks: one, our base learner, which learns to perform a task, and the other, the meta learner, which tries to find the optimal parameter. But how does this work?

We know that, in LSTM, we use a forget gate for discarding information that is not required in the memory, and it can be represented as follows:

How can this forget gate be useful in our optimization setting? Let's say we are in a position where the loss is high, and the gradient is close to zero. How can we escape from this position? In this case, we can shrink the parameters of our model and forget some parts of its previous value. So, we can use our forget gate to do that and it takes a current parameter value

, current loss

, current gradient

 and the previous forget gate as the input; it can be represented as follows:

Now let's come to the input gate. We know that the input gate in LSTM is used for deciding what value to update, and it can be represented as follows:

In our few-shot learning setting, we can use this input gate to tune our learning rate to learn quickly while preventing it from divergence:

So, our meta learner learns the optimal value of

and

after several updates.

But still, how does this work?

Let's say we have a base network

parameterized by

and our LSTM meta learner

parameterized by

. Assume that we have a dataset

. We split our dataset as

and

for training and testing respectively. First, we randomly initialize our meta learner parameter

.

 

For some T number of iterations, we randomly sample data points from

, calculate the loss, and then we calculate the gradients of loss with respect to our model parameter

. Now we feed this gradient, loss, and meta learner parameter

to our meta learner. Our meta learner

will return a cell state

and then we update our base network

parameter

 at a time t as

.We repeat this for some N number of times, as shown in the following diagram:

So, after T iterations, we will have an optimal parameter

. But how can we check the performance of

and how can we update our meta learner parameter? We take the test set and compute the loss on our test set with parameter

. Then, we calculate the gradients of the loss with respect to our meta learner parameter

and then we update

, as shown here:

We do this for some n number of iterations and update our meta learner. The overall algorithm is shown here:

Summary


We started off by understanding what meta learning is and how one-shot, few-shot, and zero-shot learning is used in meta learning. We learned that the support set and query set are more like a train set and test set but with k data points in each of the classes. We also saw what n-way k-shot means. Later, we understood different types of meta learning techniques. Then, we explored learning to learn gradient descent by gradient descent where we saw how RNN is used as an optimizer to optimize the base network. Later, we saw optimization as a model for few-shot learning where we used LSTM as a meta learner for optimizing in the few-shot learning setting.

In the next chapter, we will learn about a metric-based meta learning algorithm called the Siamese network and we will see how to use a Siamese network for performing face and audio recognition.

Questions


  1. What is meta learning?
  2. What is few-shot learning?
  3. What is a support set?
  4. What is a query set?
  5. What is metric-based learning called?
  6. How do we perform training in meta learning?

Further reading


Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand the foundations of meta learning algorithms
  • Explore practical examples to explore various one-shot learning algorithms with its applications in TensorFlow
  • Master state of the art meta learning algorithms like MAML, reptile, meta SGD

Description

Meta learning is an exciting research trend in machine learning, which enables a model to understand the learning process. Unlike other ML paradigms, with meta learning you can learn from small datasets faster. Hands-On Meta Learning with Python starts by explaining the fundamentals of meta learning and helps you understand the concept of learning to learn. You will delve into various one-shot learning algorithms, like siamese, prototypical, relation and memory-augmented networks by implementing them in TensorFlow and Keras. As you make your way through the book, you will dive into state-of-the-art meta learning algorithms such as MAML, Reptile, and CAML. You will then explore how to learn quickly with Meta-SGD and discover how you can perform unsupervised learning using meta learning with CACTUs. In the concluding chapters, you will work through recent trends in meta learning such as adversarial meta learning, task agnostic meta learning, and meta imitation learning. By the end of this book, you will be familiar with state-of-the-art meta learning algorithms and able to enable human-like cognition for your machine learning models.

What you will learn

Understand the basics of meta learning methods, algorithms, and types Build voice and face recognition models using a siamese network Learn the prototypical network along with its variants Build relation networks and matching networks from scratch Implement MAML and Reptile algorithms from scratch in Python Work through imitation learning and adversarial meta learning Explore task agnostic meta learning and deep meta learning
Estimated delivery fee Deliver to New Zealand

Standard delivery 10 - 13 business days

NZ$20.95

Premium delivery 5 - 8 business days

NZ$74.95
(Includes tracking information)

Product Details

Country selected

Publication date : Dec 31, 2018
Length 226 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789534207
Category :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Estimated delivery fee Deliver to New Zealand

Standard delivery 10 - 13 business days

NZ$20.95

Premium delivery 5 - 8 business days

NZ$74.95
(Includes tracking information)

Product Details


Publication date : Dec 31, 2018
Length 226 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781789534207
Category :

Table of Contents

17 Chapters
Title Page Chevron down icon Chevron up icon
Dedication Chevron down icon Chevron up icon
About Packt Chevron down icon Chevron up icon
Contributors Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
1. Introduction to Meta Learning Chevron down icon Chevron up icon
2. Face and Audio Recognition Using Siamese Networks Chevron down icon Chevron up icon
3. Prototypical Networks and Their Variants Chevron down icon Chevron up icon
4. Relation and Matching Networks Using TensorFlow Chevron down icon Chevron up icon
5. Memory-Augmented Neural Networks Chevron down icon Chevron up icon
6. MAML and Its Variants Chevron down icon Chevron up icon
7. Meta-SGD and Reptile Chevron down icon Chevron up icon
8. Gradient Agreement as an Optimization Objective Chevron down icon Chevron up icon
9. Recent Advancements and Next Steps Chevron down icon Chevron up icon
1. Assessments Chevron down icon Chevron up icon
2. Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela