You're reading from Hands-On Machine Learning for Algorithmic Trading Design and implement investment strategies based on smart algorithms that learn from data using Python

Product type Paperback

Published in Dec 2018

Publisher Packt

ISBN-13 9781789346411

Length 684 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Design

Authors (2):

Yau

Stefan Jansen

View More author details

ML and algorithmic trading strategies

Quantitative strategies have evolved and become more sophisticated in three waves:

In the 1980s and 1990s, signals often emerged from academic research and used a single or very few inputs derived from market and fundamental data. These signals are now largely commoditized and available as ETF, such as basic mean-reversion strategies.
In the 2000s, factor-based investing proliferated. Funds used algorithms to identify assets exposed to risk factors like value or momentum to seek arbitrage opportunities. Redemptions during the early days of the financial crisis triggered the quant quake of August 2007 that cascaded through the factor-based fund industry. These strategies are now also available as long-only smart-beta funds that tilt portfolios according to a given set of risk factors.
The third era is driven by investments in ML capabilities and alternative data to generate profitable signals for repeatable trading strategies. Factor decay is a major challenge: the excess returns from new anomalies have been shown to drop by a quarter from discovery to publication, and by over 50% after publication due to competition and crowding.

There are several categories of trading strategies that use algorithms to execute trading rules:

Short-term trades that aim to profit from small price movements, for example, due to arbitrage
Behavioral strategies that aim to capitalize on anticipating the behavior of other market participants
Programs that aim to optimize trade execution, and
A large group of trading based on predicted pricing

The HFT funds discussed above most prominently rely on short holding periods to benefit from minor price movements based on bid-ask arbitrage or statistical arbitrage. Behavioral algorithms usually operate in lower liquidity environments and aim to anticipate moves by a larger player likely to significantly impact the price. The expectation of the price impact is based on sniffing algorithms that generate insights into other market participants' strategies, or market patterns such as forced trades by ETFs.

Trade-execution programs aim to limit the market impact of trades and range from the simple slicing of trades to match time-weighted average pricing (TWAP) or volume-weighted average pricing (VWAP). Simple algorithms leverage historical patterns, whereas more sophisticated algorithms take into account transaction costs, implementation shortfall or predicted price movements. These algorithms can operate at the security or portfolio level, for example, to implement multileg derivative or cross-asset trades.

Use Cases of ML for Trading

ML extracts signals from a wide range of market, fundamental, and alternative data, and can be applied at all steps of the algorithmic trading-strategy process. Key applications include:

Data mining to identify patterns and extract features
Supervised learning to generate risk factors or alphas and create trade ideas
Aggregation of individual signals into a strategy
Allocation of assets according to risk profiles learned by an algorithm
The testing and evaluation of strategies, including through the use of synthetic data
The interactive, automated refinement of a strategy using reinforcement learning

We briefly highlight some of these applications and identify where we will demonstrate their use in later chapters.

Data mining for feature extraction

The cost-effective evaluation of large, complex datasets requires the detection of signals at scale. There are several examples throughout the book:

Information theory is a useful tool to extract features that capture potential signals and can be used in ML models. In Chapter 4, Alpha Factor Research we use mutual information to assess the potential values of individual features for a supervised learning algorithm to predict asset returns.
In Chapter 12, Unsupervised Learning, we introduce various techniques to create features from high-dimensional datasets. In Chapter 14, Topic Modeling, we apply these techniques to text data.
We emphasize model-specific ways to gain insights into the predictive power of individual variables. We use a novel game-theoretic approach called SHapley Additive exPlanations (SHAP) to attribute predictive performance to individual features in complex Gradient Boosting machines with a large number of input variables.

Supervised learning for alpha factor creation and aggregation

The main rationale for applying ML to trading is to obtain predictions of asset fundamentals, price movements or market conditions. A strategy can leverage multiple ML algorithms that build on each other. Downstream models can generate signals at the portfolio level by integrating predictions about the prospects of individual assets, capital market expectations, and the correlation among securities. Alternatively, ML predictions can inform discretionary trades as in the quantamental approach outlined above. ML predictions can also target specific risk factors, such as value or volatility, or implement technical approaches, such as trend following or mean reversion:

In Chapter 3, Alternative Data for Finance, we illustrate how to work with fundamental data to create inputs to ML-driven valuation models
In Chapter 13, Working with Text Data, Chapter 14, Topic Modeling, and Chapter 15, Word Embeddings we use alternative data on business reviews that can be used to project revenues for a company as an input for a valuation exercise.
In Chapter 8, Time Series Models, we demonstrate how to forecast macro variables as inputs to market expectations and how to forecast risk factors such as volatility
In Chapter 18, Recurrent Neural Networks we introduce recurrent neural networks (RNNs) that achieve superior performance with non-linear time series data.

Asset allocation

ML has been used to allocate portfolios based on decision-tree models that compute a hierarchical form of risk parity. As a result, risk characteristics are driven by patterns in asset prices rather than by asset classes and achieve superior risk-return characteristics.

In Chapter 5, Strategy Evaluation and Chapter 12, Unsupervised Learning, we illustrate how hierarchical clustering extracts data-driven risk classes that better reflect correlation patterns than conventional asset class definition.

Testing trade ideas

Backtesting is a critical step to select successful algorithmic trading strategies. Cross-validation using synthetic data is a key ML technique to generate reliable out-of-sample results when combined with appropriate methods to correct for multiple testing. The time series nature of financial data requires modifications to the standard approach to avoid look-ahead bias or otherwise contaminate the data used for training, validation, and testing. In addition, the limited availability of historical data has given rise to alternative approaches that use synthetic data:

We will demonstrate various methods to test ML models using market, fundamental, and alternative that obtain sound estimates of out-of-sample errors.
In Chapter 20, Autoencoders and Generative Adversarial Nets, we present GAN that are capable of producing high-quality synthetic data.

Reinforcement learning

Trading takes place in a competitive, interactive marketplace. Reinforcement learning aims to train agents to learn a policy function based on rewards.

In Chapter 21, Reinforcement Learning we present key reinforcement algorithms like Q-Learning and the Dyna architecture and demonstrate the training of reinforcement algorithms for trading using OpenAI's gym environment.