Packt+ | Advance your knowledge in tech

You're reading from Hands-On Machine Learning for Algorithmic Trading Design and implement investment strategies based on smart algorithms that learn from data using Python

Product type Paperback

Published in Dec 2018

Publisher Packt

ISBN-13 9781789346411

Length 684 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Design

Authors (2):

Jeffrey Yau

Stefan Jansen

View More author details

Chapter 1, Machine Learning for Trading, identifies the focus of the book by outlining how ML matters in generating and evaluating signals for the design and execution of a trading strategy. It outlines the strategy process from hypothesis generation and modeling, data selection, and backtesting to evaluation and execution in a portfolio context, including risk management.

Chapter 2, Market and Fundamental Data, covers sources and working with original exchange-provided tick and financial reporting data, as well as how to access numerous open-source data providers that we will rely on throughout this book.

Chapter 3, Alternative Data for Finance, provides categories and criteria to assess the exploding number of sources and providers. It also demonstrates how to create alternative data sets by scraping websites, for example to collect earnings call transcripts for use with natural language processing (NLP) and sentiment analysis algorithms in the second part of the book.

Chapter 4, Alpha Factor Research, provides a framework for understanding how factors work and how to measure their performance, for example using the information coefficient (IC). It demonstrates how to engineer alpha factors from data using Python libraries offline and on the Quantopian platform. It also introduces the zipline library to backtest factors and the alphalens library to evaluate their predictive power.

Chapter 5, Strategy Evaluation, introduces how to build, test and evaluate trading strategies using historical data with zipline offline and on the Quantopian platform. It presents and demonstrates how to compute portfolio performance and risk metrics using the pyfolio library. It also addresses how to manage methodological challenges of strategy backtests and introduce methods to optimize a strategy from a portfolio risk perspective.

Chapter 6, Machine Learning Workflow, sets the stage by outlining how to formulate, train, tune and evaluate the predictive performance of ML models as a systematic workflow.

Chapter 7, Linear Models, it shows how to use linear and logistic regression for inference and prediction and how to use regularization to manage the risk of overfitting. It presents the Quantopian trading platform and demonstrates how to build factor models and predict asset prices.

Chapter 8, Time Series Models, covers univariate and multivariate time series, including vector autoregressive models and cointegration tests, and how they can be applied to pairs trading strategies.

Chapter 9, Bayesian Machine Learning, presents how to formulate probabilistic models and how Markov Chain Monte Carlo (MCMC) sampling and Variational Bayes facilitate approximate inference. It also illustrates how to use PyMC3 for probabilistic programming to gain deeper insights into parameter and model uncertainty.

Chapter 10, Decision Trees and Random Forests, shows how to build, train and tune non-linear tree-based models for insight and prediction. It introduces tree-based ensemble models and shows how random forests use bootstrap aggregation to overcome some of the weaknesses of decision trees. Chapter 11, Gradient Boosting Machines ensemble models and demonstrates how to use the libraries xgboost, lightgbm, and catboost for high-performance training and prediction, and reviews in depth how to tune the numerous hyperparameters.

Chapter 11, Gradient Boosting Machines, demonstrates how to use the libraries xgboost, lightgbm, and catboost for high-performance training and prediction, and reviews in depth how to tune the numerous hyperparameters.

Chapter 12, Unsupervised Learning, introduces how to use dimensionality reduction and clustering for algorithmic trading. It uses principal and independent component analysis to extract data-driven risk factors. It presents several clustering techniques and demonstrates the use of hierarchical clustering for asset allocation.

Chapter 13, Working with Text Data, demonstrates how to convert text data into a numerical format and applies the classification algorithms from part two for sentiment analysis to large datasets.

Chapter 14, Topic Modeling, applies Bayesian unsupervised learning to extract latent topics that can summarize a large number of documents and offer more effective ways to explore text data or use topics as features for a classification model. It demonstrates how to apply this technique to earnings call transcripts sourced in Chapter 3, Alternative Data for Finance, and to annual reports filed with the Securities and Exchange Commission (SEC).

Chapter 15, Word Embeddings, uses neural networks to learn state-of-the-art language features in the form of word vectors that capture semantic context much better than traditional text features and represent a very promising avenue for extracting trading signals from text data.

Chapter 16, Deep Learning, introduces Keras, TensorFlow, and PyTorch, the most popular deep learning frameworks that we will use throughout part four. It also presents techniques for training and tuning, including regularization and provides an overview of common architectures.

Chapter 17, Convolutional Neural Networks, covers CNNs that are very powerful for classification tasks with unstructured data at scale. We will introduce successful architectural designs, train a CNN on satellite data, for example, to predict economic activity, and use transfer learning to speed up training.

Chapter 18, Recurrent Neural Networks, shows how RNNs are useful for sequence-to-sequence modeling, including for time series. It demonstrates how RNN capture non-linear patterns over longer periods.

Chapter 19, Autoencoders and Generative Adversarial Nets, addresses unsupervised deep learning including autoencoders for non-linear compression of high-dimensional data and Generative Adversarial Networks (GANs), one of the most important recent innovations to generate synthetic data.

Chapter 20, Reinforcement Learning, presents reinforcement learning that permits the design and training of agents that learn to optimize decisions over time in response to their environment. You will see how build an agent that responds to market signals using the Open AI gym.

Chapter 21, Next Steps, is a summary of all the previous chapters.