Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning for Time-Series with Python

You're reading from   Machine Learning for Time-Series with Python Forecast, predict, and detect anomalies with state-of-the-art machine learning methods

Arrow left icon
Product type Paperback
Published in Oct 2021
Publisher Packt
ISBN-13 9781801819626
Length 370 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Ben Auffarth Ben Auffarth
Author Profile Icon Ben Auffarth
Ben Auffarth
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Introduction to Time-Series with Python 2. Time-Series Analysis with Python FREE CHAPTER 3. Preprocessing Time-Series 4. Introduction to Machine Learning for Time-Series 5. Forecasting with Moving Averages and Autoregressive Models 6. Unsupervised Methods for Time-Series 7. Machine Learning Models for Time-Series 8. Online Learning for Time-Series 9. Probabilistic Models for Time-Series 10. Deep Learning for Time-Series 11. Reinforcement Learning for Time-Series 12. Multivariate Forecasting 13. Other Books You May Enjoy
14. Index

Deep Q-Learning

Q-learning, introduced by Chris Watkins in 1989, is an algorithm to learn the value of an action in a particular state. Q-learning revolves around representing the expected rewards for an action taken in a given state.

The expected reward of the state-action combination is approximated by the Q function:

Q is initialized to a fixed value, usually at random. At each time step t, the agent selects an action and sees a new state of the environment as a consequence and receives a reward.

The value function Q can then be updated according to the Bellman equation as the weighted average of the old value and the new information:

The weight is by , the learning rate – the higher the learning rate, the more adaptive the Q-function. The discount factor is weighting the rewards by their immediacy – the higher , the more impatient (myopic) the agent becomes.

represents the current reward. is the reward obtained by weighted by...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image