You're reading from Hands-On Financial Trading with Python A practical guide to using Zipline and other Python libraries for backtesting trading strategies

Product type Paperback

Published in Apr 2021

Publisher Packt

ISBN-13 9781838982881

Length 360 pages

Edition 1st Edition

Languages

Python

Tools

NumPy

Concepts

Data Analysis

Authors (2):

Sourav Ghosh

Jiri Pik

View More author details

Table of Contents (15) Chapters

Preface

1. Section 1: Introduction to Algorithmic Trading FREE CHAPTER

2. Chapter 1: Introduction to Algorithmic Trading

3. Section 2: In-Depth Look at Python Libraries for the Analysis of Financial Datasets

4. Chapter 2: Exploratory Data Analysis in Python

5. Chapter 3: High-Speed Scientific Computing Using NumPy

6. Chapter 4: Data Manipulation and Analysis with pandas

7. Chapter 5: Data Visualization Using Matplotlib

8. Chapter 6: Statistical Estimation, Inference, and Prediction

9. Section 3: Algorithmic Trading in Python

10. Chapter 7: Financial Market Data Access in Python

11. Chapter 8: Introduction to Zipline and PyFolio

12. Chapter 9: Fundamental Algorithmic Trading Strategies

13. Other Books You May Enjoy

Appendix A: How to Setup a Python Environment

Understanding the components of an algorithmic trading system

A client-side algorithmic trading infrastructure can be broken down broadly into two categories: core infrastructure and quantitative infrastructure.

The core infrastructure of an algorithmic trading system

A core infrastructure handles communication with the exchange using market data and order entry protocols. It is responsible for relaying information between the exchange and the algorithmic trading strategy.

Its components are also responsible for capturing, timestamping, and recording historical market data, which is one of the top priorities for algorithmic trading strategy research and development.

The core infrastructure also includes a layer of risk management components to guard the trading system against erroneous or runaway trading strategies to prevent catastrophic outcomes.

Finally, some of the less glamorous tasks involved in the algorithmic trading business, such as back-office reconciliation tasks, compliance, and more, are also addressed by the core infrastructure.

Trading servers

The trading server involves one or more computers receiving and processing market and other relevant data, and trading exchange information (for example, an order book), and issuing trading orders.

From the limit order book, updates to the exchange matching book are disseminated to all market participants over market data protocols.

Market participants have trading servers that receive these market data updates. While, technically, these trading servers can be anywhere in the world, modern algorithmic trading participants have their trading servers placed in a data center very close to the exchange matching engine. This is called a colocated or Direct Market Access (DMA) setup, which guarantees that participants receive market data updates as fast as possible by being as close to the matching engine as possible.

Once the market data update, which is communicated via exchange-provided market data protocols, is received by each market participant, they use software applications known as market data feed handlers to decode the market data updates and feed it to the algorithmic trading strategy on the client side.

Once the algorithmic trading strategy has digested the market data update, based on the intelligence developed in the strategy, it generates outgoing order flow. This can be the addition, modification, or cancellation of orders at specific prices and quantities.

The order requests are picked up by an, often, separate client component known as the order entry gateway. The order entry gateway component communicates with the exchange using order entry protocols to translate this request from the strategy to the exchange. Notifications in response to these order requests are sent by the electronic exchange back to the order entry gateway. Again, in response to this order flow by a specific market participant, the matching engine generates market data updates, therefore going back to the beginning of this information flow loop.

The quantitative infrastructure of an algorithmic trading system

A quantitative infrastructure builds on top of the platform provided by the core infrastructure and, essentially, tries to build components on top to research, develop, and effectively leverage the platform to generate revenue.

The research framework includes components such as backtesting, Post-Trade Analytics (PTA), and signal research components.

Other components that are used in research as well as deployed to live markets would be limit order books, predictive signals, and signal aggregators, which combine individual signals into a composite signal.

Execution logic components use trading signals and do the heavy lifting of managing live orders, positions, and Profit And Loss (PnL) across different strategies and trading instruments.

Finally, trading strategies themselves have a risk management component to manage and mitigate risk across different strategies and instruments.

Trading strategies

Profitable trading ideas have always been driven by human intuition developed from observing the patterns of market conditions and the outcomes of various strategies under different market conditions.

For example, historically, it has been observed that large market rallies generate investor confidence, causing more market participants to jump in and buy more; therefore, recursively causing larger rallies. Conversely, large drops in market prices scare off participants invested in the trading instrument, causing them to sell their holdings and exacerbate the drop in prices. These intuitive ideas backed by observations in markets led to the idea of trend-following strategies.

It has also been observed that short-term volatile moves in either direction often tend to revert to their previous market price, leading to mean reversion-based speculators and trading strategies. Similarly, historical observations that similar product prices move together, which also makes intuitive sense have led to the generation of correlation and collinearity-based trading strategies such as statistical arbitrage and pairs trading strategies.

Since every market participant uses different trading strategies, the final market prices reflect the majority of market participants. Trading strategies whose views align with the majority of market participants are profitable under those conditions. A single trading strategy generally cannot be profitable 100 percent of the time, so sophisticated participants have a portfolio of trading strategies.

Trading signals

Trading signals are also referred to as features, calculators, indicators, predictors, or alpha.

Trading signals are what drive algorithmic trading strategy decisions. Signals are well-defined pieces of intelligence derived from market data, alternative data (such as news, social media feeds, and more), and even our own order flow, which is designed to predict certain market conditions in the future.

Signals almost always originate from some intuitive idea and observation of certain market conditions and/or strategy performance. Often, most quantitative developers spend most of their time researching and developing new trading signals to improve profitability under different market conditions and to improve the algorithmic trading strategy overall.

The trading signal research framework

A lot of man-hours are invested in researching and discovering new signals to improve trading performance. To do that in a systematic, efficient, scalable, and scientific manner, often, the first step is to build a good signal research framework.

This framework has subcomponents for the following:

Data generation is based on the signal we are trying to build and the market conditions/objectives we are trying to capture/predict. In most real-world algorithmic trading, we use tick data, which is data that represents every single event in the market. As you might imagine, there are a lot of events every day and this leads to massive amounts of data, so you also need to think about subsampling the data received. Subsampling has several advantages, such as reducing the scale of data, eliminating the noise/spurious patches of data, and highlighting interesting/important data.
The evaluation of the predictive power or usefulness of features concerning the market objective that they are trying to capture/predict.
The maintenance of historical results of signals under different market conditions along with tuning existing signals to changing market conditions.

Signal aggregators

Signal aggregators are optional components that take inputs from individual signals and aggregate them in different ways to generate a new composite signal.

A very simple aggregation method would be to take the average of all the input signals and output the average as the composite signal value.

Readers familiar with statistical learning concepts of ensemble learning – bagging and boosting – might be able to spot a similarity between those learning models and signal aggregators. Oftentimes signal aggregators are just statistical models (regression/classification) where the input signals are just features used to predict the same final market objective.

The execution of strategies

The execution of strategies deals with efficiently managing and executing orders based on the outputs of the trading signals to minimize trading fees and slippage.

Slippage is the difference between market prices and execution prices and is caused due to the latency experienced by an order to get to the market before prices change as well as the size of an order causing a change in price once it hits the market.

The quality of execution strategies employed in an algorithmic trading strategy can significantly improve/degrade the performance of profitable trading signals.

Limit order books

Limit order books are built both in the exchange match engine and during the algorithmic trading strategies, although not necessarily all algorithmic trading signals/strategies require the entire limit order book.

Sophisticated algorithmic trading strategies can build a lot more intelligence into their limit order books. We can detect and track our own orders in the limit book and understand, given our priority, what our probability of getting our orders executed is. We can also use this information to execute our own orders even before the order entry gateway gets the execution notification from the exchange and leverage that ability to our advantage. Other more complex microstructure features such as detecting icebergs, detecting stop orders, detecting large in-flow or out-flow of buy/sell orders, and more are all possible with limit order books and market data updates at a lot of electronic trading exchanges.

Position and PnL management

Let's explore how positions and PnLs evolve as a trading strategy opens and closes long and short positions by executing trades.

When a strategy does not have a position in the market, that is, price changes do not affect the trading account's value, it is referred to as having a flat position.

From a flat position, if a buy order executes, then it is referred to as having a long position. If a strategy has a long position and prices increase, the position profits from the price increase. PnL also increases in this scenario, that is, profit increases (or loss decreases). Conversely, if a strategy has a long position and prices decrease, the position loses from the price decrease. PnL decreases in this scenario, for example, the profit decreases (or the loss increases).

From a flat position, if a sell order is executed then it is referred to as having a short position. If a strategy has a short position and prices decrease, the position profits from the price decrease. PnL increases in this scenario. Conversely, if a strategy has a short position and prices increase, then PnL decreases. PnL for a position that is still open is referred to as unrealized PnL since PnL changes with price changes as long as the position remains open.

A long position is closed by selling an amount of the instrument equivalent to the position size. This is referred to as closing or flattening a position, and, at this point, PnL is referred to as realized PnL since it no longer changes as price changes since the position is closed.

Similarly, short positions are closed by buying the same amount as the position size.

At any point, the total PnL is the sum of realized PnLs on all closed positions and unrealized PnLs on all open positions.

When a long or short position is composed of buys or sells at multiple prices with different sizes, then the average price of the position is computed by computing the Volume Weighted Average Price (VWAP), which is the price of each execution weighted by the quantity executed at each price. Marking to market refers to taking the VWAP of a position and comparing that to the current market price to get a sense of how profitable or lossy a certain long/short position is.

Backtesting

A backtester uses historically recorded market data and simulation components to simulate the behavior and performance of an algorithmic trading strategy as if it were deployed to live markets in the past. Algorithmic trading strategies are developed and optimized using a backtester until the strategy performance is in line with expectations.

Backtesters are complex components that need to model market data flow, client-side and exchange-side latencies in software and network components, accurate FIFO priorities, slippage, fees, and market impact from strategy order flow (that is, how would other market participants react to a strategy's order flow being added to the market data flow) to generate accurate strategy and portfolio performance statistics.

PTA

PTA is performed on trades generated by an algorithmic trading strategy run in simulation or live markets.

PTA systems are used to generate performance statistics from historically backtested strategies with the objective to understand historical strategy performance expectations.

When applied to trades generated from live trading strategies, PTA can be used to understand strategy performance in live markets as well as compare and assert that live trading performance is in line with simulated strategy performance expectations.

Risk management

Good risk management principles ensure that strategies are run for optimal PnL performance and safeguards are put in place against runaway/errant strategies.

Bad risk management cannot only turn a profitable trading strategy into a non-profitable one but can also put the investor's entire capital at risk due to uncontrolled strategy losses, malfunctioning strategies, and possible regulatory repercussions.