Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning Using TensorFlow Cookbook

You're reading from   Machine Learning Using TensorFlow Cookbook Create powerful machine learning algorithms with TensorFlow

Arrow left icon
Product type Paperback
Published in Feb 2021
Publisher Packt
ISBN-13 9781800208865
Length 416 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Konrad Banachewicz Konrad Banachewicz
Author Profile Icon Konrad Banachewicz
Konrad Banachewicz
Luca Massaron Luca Massaron
Author Profile Icon Luca Massaron
Luca Massaron
Alexia Audevart Alexia Audevart
Author Profile Icon Alexia Audevart
Alexia Audevart
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Getting Started with TensorFlow 2.x 2. The TensorFlow Way FREE CHAPTER 3. Keras 4. Linear Regression 5. Boosted Trees 6. Neural Networks 7. Predicting with Tabular Data 8. Convolutional Neural Networks 9. Recurrent Neural Networks 10. Transformers 11. Reinforcement Learning with TensorFlow and TF-Agents 12. Taking TensorFlow to Production 13. Other Books You May Enjoy
14. Index

MAB

In probability theory, a multi-armed bandit (MAB) problem refers to a situation where a limited set of resources must be allocated between competing choices in such a manner that some form of long-term objective is maximized. The name originated from the analogy that was used to formulate the first version of the model. Imagine we have a gambler facing a row of slot machines who has to decide which ones to play, how many times, and in what order. In RL, we formulate it as an agent that wants to balance exploration (acquisition of new knowledge) and exploitation (optimizing decisions based on experience already acquired). The objective of this balancing is the maximization of a total reward over a period of time.

An MAB is a simplified RL problem: an action taken by the agent does not influence the subsequent state of the environment. This means that there is no need to model state transitions, credit rewards to past actions, or plan ahead to get to rewarding states. The goal...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime