You're reading from Machine Learning for Algorithmic Trading Predictive models to extract signals from market and alternative data for systematic trading strategies with Python

Product type Paperback

Published in Jul 2020

Publisher Packt

ISBN-13 9781839217715

Length 820 pages

Edition 2nd Edition

Languages

Python

Tools

TensorFlow

Concepts

Machine Learning

Author (1):

Stefan Jansen

View More author details

Table of Contents (27) Chapters

Preface

1. Machine Learning for Trading – From Idea to Execution

2. Market and Fundamental Data – Sources and Techniques FREE CHAPTER

3. Alternative Data for Finance – Categories and Use Cases

4. Financial Feature Engineering – How to Research Alpha Factors

5. Portfolio Optimization and Performance Evaluation

6. The Machine Learning Process

7. Linear Models – From Risk Factors to Return Forecasts

8. The ML4T Workflow – From Model to Strategy Backtesting

9. Time-Series Models for Volatility Forecasts and Statistical Arbitrage

10. Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading

11. Random Forests – A Long-Short Strategy for Japanese Stocks

12. Boosting Your Trading Strategy

13. Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning

14. Text Data for Trading – Sentiment Analysis

15. Topic Modeling – Summarizing Financial News

16. Word Embeddings for Earnings Calls and SEC Filings

17. Deep Learning for Trading

18. CNNs for Financial Time Series and Satellite Images

19. RNNs for Multivariate Time Series and Sentiment Analysis

20. Autoencoders for Conditional Risk Factors and Asset Pricing

21. Generative Adversarial Networks for Synthetic Time-Series Data

22. Deep Reinforcement Learning – Building a Trading Agent

23. Conclusions and Next Steps

24. References

25. Index

Appendix: Alpha Factor Library

To get the most out of this book

In addition to the content summarized in the previous section, the hands-on nature of the book consists of over 160 Jupyter notebooks hosted on GitHub that demonstrate the use of ML for trading in practice on a broad range of data sources. This section describes how to use the GitHub repository, obtain the data used in the numerous examples, and set up the environment to run the code.

The GitHub repository

The book revolves around the application of ML algorithms to trading. The hands-on aspects are covered in Jupyter notebooks, hosted on GitHub, that illustrate many of the concepts and models in more detail. While the chapters aim to be self-contained, the code examples and results often take up too much space to include in their complete forms. Therefore, it is very important to view the notebooks that contain significant additional content while reading the chapter, even if you do not intend to run the code yourself.

The repository is organized so that each chapter has its own directory containing the relevant notebooks and a README file containing separate instructions where needed, as well as references and resources specific to the chapter's content. The relevant notebooks are identified throughout each chapter, as necessary. The repository also contains instructions on how to install the requisite libraries and obtain the data.

You can find the code files placed at: https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition.

Data sources

We will use freely available historical data from market, fundamental, and alternative sources. Chapter 2 and Chapter 3 cover characteristics and access to these data sources and introduce key providers that we will use throughout the book. The companion GitHub repository just described contains instructions on how to obtain or create some of the datasets that we will use throughout and includes some smaller datasets.

A few sample data sources that we will source and work with include, but are not limited to:

Nasdaq ITCH order book data
Electronic Data Gathering, Analysis, and Retrieval (EDGAR) SEC filings
Earnings call transcripts from Seeking Alpha
Quandl daily prices and other data points for over 3,000 US stocks
International equity data from Stooq and using the yfinance library
Various macro fundamental and benchmark data from the Federal Reserve
Large Yelp business reviews and Twitter datasets
EUROSAT satellite image data

Some of the data is large (several gigabytes), such as Nasdaq and SEC filings. The notebooks indicate when that is the case.

See the data directory in the root folder of the GitHub repository for instructions.

Anaconda and Docker images

The book requires Python 3.7 or higher and uses the Anaconda distribution. The book uses various conda environments for the four parts to cover a broad range of libraries while limiting dependencies and conflicts.

The installation directory in the GitHub repository contains detailed instructions. You can either use the provided Docker image to create a container with the necessary environments or use the .yml files to create them locally.

Download the example code files

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packtpub.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of your preferred compression tool:

WinRAR or 7-Zip for Windows
Zipeg, iZip, or UnRarX for Mac
7-Zip or PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781839217715_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example, "The compute_factors() method creates a MeanReversion factor instance and creates long, short, and ranking pipeline columns."

A block of code is set as follows:

from pykalman import KalmanFilter
kf = KalmanFilter(transition_matrices = [1],
                  observation_matrices = [1],
                  initial_state_mean = 0,
                  initial_state_covariance = 1,
                  observation_covariance=1,
                  transition_covariance=.01)

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example, "The Python Algorithmic Trading Library (PyAlgoTrade) focuses on backtesting and offers support for paper trading and live trading."

Informational notes appear like this.