Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning for Time-Series with Python

You're reading from   Machine Learning for Time-Series with Python Forecast, predict, and detect anomalies with state-of-the-art machine learning methods

Arrow left icon
Product type Paperback
Published in Oct 2021
Publisher Packt
ISBN-13 9781801819626
Length 370 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Ben Auffarth Ben Auffarth
Author Profile Icon Ben Auffarth
Ben Auffarth
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Introduction to Time-Series with Python 2. Time-Series Analysis with Python FREE CHAPTER 3. Preprocessing Time-Series 4. Introduction to Machine Learning for Time-Series 5. Forecasting with Moving Averages and Autoregressive Models 6. Unsupervised Methods for Time-Series 7. Machine Learning Models for Time-Series 8. Online Learning for Time-Series 9. Probabilistic Models for Time-Series 10. Deep Learning for Time-Series 11. Reinforcement Learning for Time-Series 12. Multivariate Forecasting 13. Other Books You May Enjoy
14. Index

Bandit algorithms

A Multi-Armed Bandit (MAB) is a classic reinforcement learning problem, in which a player is faced with a slot machine (bandit) that has k levers (arms), each with a different reward distribution. The agent's goal is to maximize its cumulative reward on a trial-by-trial basis. Since MABs are a simple but powerful framework for algorithms that make decisions over time under uncertainty, a large number of research articles have been dedicated to them.

Bandit learning refers to algorithms that aim to optimize a single unknown stationary objective function. An agent chooses an action from a set of actions . The environment reveals reward of the chosen action at time t. As information is accumulated over multiple rounds, the agent can build a good representation of the value (or reward) distribution for each arm, .

Therefore, a good policy might converge so that the choice of arm becomes optimal. According to one policy, UCB1 (published by Peter Auer, Nicol...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image