Preface
Aided by the availability of vast amounts of data computing resources, machine learning (ML) has made big strides. The financial industry, which at its heart is an information processing enterprise, holds an enormous amount of opportunity for the deployment of these new technologies.
This book is a practical guide to modern ML applied in the financial industry. Using a code-first approach, it will teach you how the most useful ML algorithms work, and how to use them to solve real-world problems
Who this book is for
There are three kinds of people who would benefit the most from this book:
Data scientists who want to break into finance and would like to know about the spectrum of possible applications and relevant problems
Developers in any FinTech business or quantitative finance professionals who look to upgrade their skill set and want to incorporate advanced ML methods into their modeling process
Students who would like to prepare themselves for the labor market and learn some practical skills valued by employers
This book assumes you have some working knowledge in linear algebra, statistics, probability theory, and calculus. However, you do not have to be an expert in any of those topics.
To follow the code examples, you should be comfortable with Python and the most common data science libraries, such as pandas, NumPy, and Matplotlib. The book's example code is presented in Jupyter Notebooks.
Explicit knowledge of finance is not required.
What this book covers
Chapter 1, Neural Networks and Gradient-Based Optimization, will explore what kinds of ML there are, and the motivations for using them in different areas of the financial industry. We will then learn how neural networks work and build one from scratch.
Chapter 2, Applying Machine Learning to Structured Data, will deal with data that resides in a fixed field within, for example, a relational database. We will walk through the process of model creation: from forming a heuristic, to building a simple model on engineered features, to a fully learned solution. On the way, we will learn about how to evaluate our models with scikit-learn, how to train tree-based methods such as random forests, and how to use Keras to build a neural network for this task.
Chapter 3, Utilizing Computer Vision, describes how computer vision allows us to perceive and interpret the real world at scale. In this chapter, we will learn the mechanisms with which computers can learn to identify image content. We will learn about convolutional neural networks and the Keras building blocks we need to design and train state-of-the-art computer vision models.
Chapter 4, Understanding Time Series, looks at the large number of tools devoted to the analysis of temporally related data. In this chapter, we will first discuss the "greatest hits" that industry professionals have been using to model time series and how to use them efficiently with Python. We will then discover how modern ML algorithms can find patterns in time series and how they are complemented by classic methods.
Chapter 5, Parsing Textual Data with Natural Language Processing, uses the spaCy library and a large corpus of news to discuss how common tasks such as named entity recognition and sentiment analysis can be performed quickly and efficiently. We will then learn how we can use Keras to build our own custom language models. The chapter introduces the Keras functional API, which allows us to build much more complex models that can, for instance, translate between languages.
Chapter 6, Using Generative Models, explains how generative models generate new data. This is useful when we either do not have enough data or want to analyze our data by learning about how the model perceives it. In this chapter, we will learn about (variational) autoencoders as well as generative adversarial models. We will learn how to make sense of them using the t-SNE algorithm and how to use them for unconventional purposes, such as catching credit card fraud. We will learn about how we can supplement human labeling operations with ML to streamline data collection and labeling. Finally, we will learn how to use active learning to collect the most useful data and greatly reduce data needs.
Chapter 7, Reinforcement Learning for Financial Markets, looks at reinforcement learning, which is an approach that does not require a human-labeled "correct" answer for training, but only a reward signal. In this chapter, we will discuss and implement several reinforcement learning algorithms, from Q-learning to Advantage Actor-Critic (A2C). We will discuss the underlying theory, its connection to economics, and in a practical example, see how reinforcement learning can be used to directly inform portfolio formation.
Chapter 8, Privacy, Debugging, and Launching Your Products, addresses how there is a lot that can go wrong when building and shipping complex models. We will discuss how to debug and test your data, how to keep sensitive data private while training models on it, how to prepare your data for training, and how to disentangle why your model is making the predictions it makes. We will then look at how to automatically tune your model's hyperparameters, how to use the learning rate to reduce overfitting, and how to diagnose and avoid exploding and vanishing gradients. After that, the chapter explains how to monitor and understand the right metrics in production. Finally, it discusses how you can improve the speed of your models.
Chapter 9, Fighting Bias, discusses how ML models can learn unfair policies and even break anti-discrimination laws. It highlights several approaches to improve model fairness, including pivot learning and causal learning. It shows how to inspect models and probe for bias. Finally, we discuss how unfairness can be a failure in the complex system that your model is embedded in and give a checklist that can help you reduce bias.
Chapter 10, Bayesian Inference and Probabilistic Programming, uses PyMC3 to discuss the theory and practical advantages of probabilistic programming. We will implement our own sampler, understand Bayes theorem numerically, and finally learn how we can infer the distribution of volatility from stock prices.
To get the most out of this book
All code examples are hosted on Kaggle. You can use Kaggle for free and get access to a GPU, which will enable you to run the example code much faster. If you do not have a very powerful machine with a GPU, it will be much more comfortable to run the code on Kaggle. You can find links to all notebooks on this book's GitHub page: https://github.com/PacktPublishing/Machine-Learning-for-Finance.
This book assumes some working knowledge of mathematical concepts such as linear algebra, statistics, probability theory, and calculus. You do not have to be an expert, however.
Equally, knowledge of Python and some popular data science libraries such as pandas and Matplotlib is assumed.
Download the example code files
You can download the example code files for this book from your account at http://www.packt.com. If you purchased this book elsewhere, you can visit http://www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at http://www.packt.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the on-screen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789136364_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText
: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "Mount the downloaded WebStorm-10*.dmg
disk image file as another disk in your system."
A block of code is set as follows:
import numpy as np x_train = np.expand_dims(x_train,-1) x_test = np.expand_dims(x_test,-1) x_train.shape
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
from keras.models import Sequential
img_shape = (28,28,1)
model = Sequential()
model.add(Conv2D(6,3,input_shape=img_shape))
Any command-line input or output is written as follows:
Train on 60000 samples, validate on 10000 samples Epoch 1/10 60000/60000 [==============================] - 22s 374us/step - loss: 7707.2773 - acc: 0.6556 - val_loss: 55.7280 - val_acc: 0.7322
Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: "Select System info from the Administration panel."
Note
Warnings or important notes appear like this.
Note
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com
.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.