You're reading from Algorithmic Short Selling with Python Refine your algorithmic trading edge, consistently generate investment ideas, and build a robust long/short product

Product type Paperback

Published in Sep 2021

Publisher Packt

ISBN-13 9781801815192

Length 376 pages

Edition 1st Edition

Languages

Python

Concepts

Financial Technology

Author (1):

Laurent Bernut

View More author details

Table of Contents (17) Chapters

Preface

The Stock Market Game

10 Classic Myths About Short Selling FREE CHAPTER

Take a Walk on the Wild Short Side

Long/Short Methodologies: Absolute and Relative

Regime Definition

The Trading Edge is a Number, and Here is the Formula

Improve Your Trading Edge

Position Sizing: Money is Made in the Money Management Module

Risk is a Number

Refining the Investment Universe

The Long/Short Toolbox

Signals and Execution

Portfolio Management System

Other Books You May Enjoy

Index

Appendix: Stock Screening

Data download and processing

We'll start by downloading the ticker lists from Wikipedia. This uses the powerful pd.read_html method we saw in Chapter 4, Long/Short Methodologies: Absolute and Relative:

web_df = pd.read_html(website)[0]
tickers_list =  list(web_df['Symbol'])
tickers_list = tickers_list[:]
print('tickers_list',len(tickers_list))
web_df.head()

tickers_list can be truncated by filling numbers in the bracket section of tickers_list[:].

Now, this is where the action is happening. There are a few nested loops in the engine room.

Batch download: this is the high-level loop. OHLCV is downloaded in a multi-index dataframe in a succession of batches. The number of iterations is a function of the length of the tickers list and the batch size. 505 constituents divided by a batch size of 20 is 26 (the last batch being 6 tickers long).
Drop level loop: this breaks the multi-index dataframe into single ticker OHLCV dataframes. The number of iterations equals the batch size. Regimes are processed at this level.
Absolute/relative process: There are 2 passes. The first pass processes data in the absolute series. Variables are reset to the relative series at the end and then processed accordingly in the second pass. There is an option to save the ticker information as a CSV file. The last row dictionary is created at the end of the second pass.

Next, let's go through the process step-by-step:

Benchmark download closing price and currency adjustment. This needs to be done once, so it is placed at the beginning of the sequence.
Dataframes and lists instantiation.
Loop size: number of iterations necessary to loop over the tickers_list.
Outer loop: batch download:
1. m,n: index along the batch_list.
2. batch_download: download using yfinance.
3. Print batch tickers, with a Boolean if you want to see the tickers names.
4. Download batch.
5. try/except: append failed list.
Second loop: Single stock drop level loop:
1. Drop level to ticker level.
2. Calculate swings and regime: abs/rel.
Third loop: absolute/relative series:
1. Process regimes in absolute series.
2. Reset variables to relative series and process regimes a second time.
Boolean to provide a save_ticker_df option.
Create a dictionary with last row values.
Append list of dictionary rows.
Create a dataframe last_row_df from dictionary.
score column: lateral sum of regime methods in absolute and relative.
Join last_row_df with web_df.
Boolean save_regime_df.

Let's publish the code and give further explanations afterwards:

# Appendix: The Engine Room
 
bm_df = pd.DataFrame()
bm_df[bm_col] = round(yf.download(tickers= bm_ticker,start= start, end = end,interval = "1d",
                 group_by = 'column',auto_adjust = True, prepost = True, 
                 treads = True, proxy = None)['Close'],dgt)
bm_df[ccy_col] = 1
print('benchmark',bm_df.tail(1))
 
regime_df = pd.DataFrame()
last_row_df = pd.DataFrame()
last_row_list = []
failed = []
 
loop_size = int(len(tickers_list) // batch_size) + 2
for t in range(1,loop_size): 
    m = (t - 1) * batch_size
    n = t * batch_size
    batch_list = tickers_list[m:n]
    if show_batch:
        print(batch_list,m,n)
        
    try:
        batch_download = round(yf.download(tickers= batch_list,start= start, end = end, 
                            interval = "1d",group_by = 'column',auto_adjust = True, 
                                  prepost = True, treads = True, proxy = None),dgt)        
        
        for flat, ticker in enumerate(batch_list):
            df = yf_droplevel(batch_download,ticker)           
            df = swings(df,rel = False)
            df = regime(df,lvl = 3,rel = False)
            df = swings(df,rel = True)
            df = regime(df,lvl = 3,rel= True)            
            _o,_h,_l,_c = lower_upper_OHLC(df,relative = False)
 
            for a in range(2): 
                df['sma'+str(_c)[:1]+str(st)+str(lt)] = regime_sma(df,_c,st,lt)
                df['bo'+str(_h)[:1]+str(_l)[:1]+ str(slow)] = regime_breakout(df,_h,_l,window)
                df['tt'+str(_h)[:1]+str(fast)+str(_l)[:1]+ str(slow)] = turtle_trader(df, _h, _l, slow, fast)
                _o,_h,_l,_c = lower_upper_OHLC(df,relative = True)                
            try: 
                last_row_list.append(last_row_dictionary(df))
            except:
                failed.append(ticker) 
    except:
        failed.append(ticker)
last_row_df = pd.DataFrame.from_dict(last_row_list)
 
if save_last_row_df:
    last_row_df.to_csv('last_row_df_'+ str(last_row_df['date'].max())+'.csv', date_format='%Y%m%d')
print('failed',failed)
 
last_row_df['score']= last_row_df[regime_cols].sum(axis=1)
regime_df = web_df[web_df_cols].set_index('Symbol').join(
    last_row_df[last_row_df_cols].set_index('Symbol'), how='inner').sort_values(by='score')
 
if save_regime_df:
    regime_df.to_csv('regime_df_'+ str(last_row_df['date'].max())+'.csv', date_format='%Y%m%d')

last_row_list.append(last_row_dictionary(df)) happens at the end of the third loop once every individual ticker has been fully processed. This list automatically updates for every ticker and every batch. Once the three loops are finished, we create the last_row_df dataframe from this list of dictionaries using pd.DataFrame.from_dict(last_row_list). This process of creating a list of dictionaries and rolling it up into a dataframe is marginally faster than directly appending them to a dataframe. The score column is a lateral sum of all the regime methodologies. The last row dataframe is then sorted by score in ascending order. There is an option to save a datestamped version. The regime dataframe is created by joining the Wikipedia web dataframe and the last row dataframe. Note that the Symbol column is set as index. Again, there is an option to save a datestamped version.

Next, let's visualize what the market is doing with a few heatmaps.