Data download and processing
We'll start by downloading the ticker lists from Wikipedia. This uses the powerful pd.read_html
method we saw in Chapter 4, Long/Short Methodologies: Absolute and Relative:
web_df = pd.read_html(website)[0]
tickers_list = list(web_df['Symbol'])
tickers_list = tickers_list[:]
print('tickers_list',len(tickers_list))
web_df.head()
tickers_list
can be truncated by filling numbers in the bracket section of tickers_list[:]
.
Now, this is where the action is happening. There are a few nested loops in the engine room.
- Batch download: this is the high-level loop. OHLCV is downloaded in a multi-index dataframe in a succession of batches. The number of iterations is a function of the length of the tickers list and the batch size. 505 constituents divided by a batch size of 20 is 26 (the last batch being 6 tickers long).
- Drop level loop: this breaks the multi-index dataframe into single ticker OHLCV dataframes...