Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On Machine Learning for Algorithmic Trading

You're reading from   Hands-On Machine Learning for Algorithmic Trading Design and implement investment strategies based on smart algorithms that learn from data using Python

Arrow left icon
Product type Paperback
Published in Dec 2018
Publisher Packt
ISBN-13 9781789346411
Length 684 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Jeffrey Yau Jeffrey Yau
Author Profile Icon Jeffrey Yau
Jeffrey Yau
Stefan Jansen Stefan Jansen
Author Profile Icon Stefan Jansen
Stefan Jansen
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Machine Learning for Trading 2. Market and Fundamental Data FREE CHAPTER 3. Alternative Data for Finance 4. Alpha Factor Research 5. Strategy Evaluation 6. The Machine Learning Process 7. Linear Models 8. Time Series Models 9. Bayesian Machine Learning 10. Decision Trees and Random Forests 11. Gradient Boosting Machines 12. Unsupervised Learning 13. Working with Text Data 14. Topic Modeling 15. Word Embeddings 16. Deep Learning 17. Convolutional Neural Networks 18. Recurrent Neural Networks 19. Autoencoders and Generative Adversarial Nets 20. Reinforcement Learning 21. Next Steps 22. Other Books You May Enjoy

ML and algorithmic trading strategies

Quantitative strategies have evolved and become more sophisticated in three waves:

  1. In the 1980s and 1990s, signals often emerged from academic research and used a single or very few inputs derived from market and fundamental data. These signals are now largely commoditized and available as ETF, such as basic mean-reversion strategies.
  2. In the 2000s, factor-based investing proliferated. Funds used algorithms to identify assets exposed to risk factors like value or momentum to seek arbitrage opportunities. Redemptions during the early days of the financial crisis triggered the quant quake of August 2007 that cascaded through the factor-based fund industry. These strategies are now also available as long-only smart-beta funds that tilt portfolios according to a given set of risk factors.
  3. The third era is driven by investments in ML capabilities and alternative data to generate profitable signals for repeatable trading strategies. Factor decay is a major challenge: the excess returns from new anomalies have been shown to drop by a quarter from discovery to publication, and by over 50% after publication due to competition and crowding.

There are several categories of trading strategies that use algorithms to execute trading rules:

  • Short-term trades that aim to profit from small price movements, for example, due to arbitrage
  • Behavioral strategies that aim to capitalize on anticipating the behavior of other market participants
  • Programs that aim to optimize trade execution, and
  • A large group of trading based on predicted pricing

The HFT funds discussed above most prominently rely on short holding periods to benefit from minor price movements based on bid-ask arbitrage or statistical arbitrage. Behavioral algorithms usually operate in lower liquidity environments and aim to anticipate moves by a larger player likely to significantly impact the price. The expectation of the price impact is based on sniffing algorithms that generate insights into other market participants' strategies, or market patterns such as forced trades by ETFs.

Trade-execution programs aim to limit the market impact of trades and range from the simple slicing of trades to match time-weighted average pricing (TWAP) or volume-weighted average pricing (VWAP). Simple algorithms leverage historical patterns, whereas more sophisticated algorithms take into account transaction costs, implementation shortfall or predicted price movements. These algorithms can operate at the security or portfolio level, for example, to implement multileg derivative or cross-asset trades.

Use Cases of ML for Trading

ML extracts signals from a wide range of market, fundamental, and alternative data, and can be applied at all steps of the algorithmic trading-strategy process. Key applications include:

  • Data mining to identify patterns and extract features
  • Supervised learning to generate risk factors or alphas and create trade ideas
  • Aggregation of individual signals into a strategy
  • Allocation of assets according to risk profiles learned by an algorithm
  • The testing and evaluation of strategies, including through the use of synthetic data
  • The interactive, automated refinement of a strategy using reinforcement learning

We briefly highlight some of these applications and identify where we will demonstrate their use in later chapters.

Data mining for feature extraction

The cost-effective evaluation of large, complex datasets requires the detection of signals at scale. There are several examples throughout the book:

  • Information theory is a useful tool to extract features that capture potential signals and can be used in ML models. In Chapter 4, Alpha Factor Research we use mutual information to assess the potential values of individual features for a supervised learning algorithm to predict asset returns.
  • In Chapter 12, Unsupervised Learning, we introduce various techniques to create features from high-dimensional datasets. In Chapter 14, Topic Modeling, we apply these techniques to text data.
  • We emphasize model-specific ways to gain insights into the predictive power of individual variables. We use a novel game-theoretic approach called SHapley Additive exPlanations (SHAP) to attribute predictive performance to individual features in complex Gradient Boosting machines with a large number of input variables.

Supervised learning for alpha factor creation and aggregation

The main rationale for applying ML to trading is to obtain predictions of asset fundamentals, price movements or market conditions. A strategy can leverage multiple ML algorithms that build on each other. Downstream models can generate signals at the portfolio level by integrating predictions about the prospects of individual assets, capital market expectations, and the correlation among securities. Alternatively, ML predictions can inform discretionary trades as in the quantamental approach outlined above. ML predictions can also target specific risk factors, such as value or volatility, or implement technical approaches, such as trend following or mean reversion:

  • In Chapter 3, Alternative Data for Finance, we illustrate how to work with fundamental data to create inputs to ML-driven valuation models
  • In Chapter 13, Working with Text Data, Chapter 14, Topic Modeling, and Chapter 15, Word Embeddings we use alternative data on business reviews that can be used to project revenues for a company as an input for a valuation exercise.
  • In Chapter 8, Time Series Models, we demonstrate how to forecast macro variables as inputs to market expectations and how to forecast risk factors such as volatility
  • In Chapter 18, Recurrent Neural Networks we introduce recurrent neural networks (RNNs) that achieve superior performance with non-linear time series data.

Asset allocation

ML has been used to allocate portfolios based on decision-tree models that compute a hierarchical form of risk parity. As a result, risk characteristics are driven by patterns in asset prices rather than by asset classes and achieve superior risk-return characteristics.

In Chapter 5, Strategy Evaluation and Chapter 12, Unsupervised Learning, we illustrate how hierarchical clustering extracts data-driven risk classes that better reflect correlation patterns than conventional asset class definition.

Testing trade ideas

Backtesting is a critical step to select successful algorithmic trading strategies. Cross-validation using synthetic data is a key ML technique to generate reliable out-of-sample results when combined with appropriate methods to correct for multiple testing. The time series nature of financial data requires modifications to the standard approach to avoid look-ahead bias or otherwise contaminate the data used for training, validation, and testing. In addition, the limited availability of historical data has given rise to alternative approaches that use synthetic data:

  • We will demonstrate various methods to test ML models using market, fundamental, and alternative that obtain sound estimates of out-of-sample errors.
  • In Chapter 20, Autoencoders and Generative Adversarial Nets, we present GAN that are capable of producing high-quality synthetic data.

Reinforcement learning

Trading takes place in a competitive, interactive marketplace. Reinforcement learning aims to train agents to learn a policy function based on rewards.

  • In Chapter 21, Reinforcement Learning we present key reinforcement algorithms like Q-Learning and the Dyna architecture and demonstrate the training of reinforcement algorithms for trading using OpenAI's gym environment.
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image