Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python for Finance Cookbook – Second Edition

You're reading from   Python for Finance Cookbook – Second Edition Over 80 powerful recipes for effective financial data analysis

Arrow left icon
Product type Paperback
Published in Dec 2022
Publisher Packt
ISBN-13 9781803243191
Length 740 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Eryk Lewinson Eryk Lewinson
Author Profile Icon Eryk Lewinson
Eryk Lewinson
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Acquiring Financial Data 2. Data Preprocessing FREE CHAPTER 3. Visualizing Financial Time Series 4. Exploring Financial Time Series Data 5. Technical Analysis and Building Interactive Dashboards 6. Time Series Analysis and Forecasting 7. Machine Learning-Based Approaches to Time Series Forecasting 8. Multi-Factor Models 9. Modeling Volatility with GARCH Class Models 10. Monte Carlo Simulations in Finance 11. Asset Allocation 12. Backtesting Trading Strategies 13. Applied Machine Learning: Identifying Credit Default 14. Advanced Concepts for Machine Learning Projects 15. Deep Learning in Finance 16. Other Books You May Enjoy
17. Index

Getting data from Nasdaq Data Link

Alternative data can be anything that is considered non-market data, for example, weather data for agricultural commodities, satellite images that track oil shipments, or even customer feedback that reflects a company’s service performance. The idea behind using alternative data is to get an “informational edge” that can then be used for generating alpha. In short, alpha is a measure of performance describing an investment strategy’s, trader’s, or portfolio manager’s ability to beat the market.

Quandl was the leading provider of alternative data products for investment professionals (including quant funds and investment banks). Recently, it was acquired by Nasdaq and is now part of the Nasdaq Data Link service. The goal of the new platform is to provide a unified source of trusted data and analytics. It offers an easy way to download data, also via a dedicated Python library.

A good starting place for financial data would be the WIKI Prices database, which contains stock prices, dividends, and splits for 3,000 US publicly traded companies. The drawback of this database is that as of April 2018, it is no longer supported (meaning there is no recent data). However, for purposes of getting historical data or learning how to access the databases, it is more than enough.

We use the same example that we used in the previous recipe—we download Apple’s stock prices for the years 2011 to 2021.

Getting ready

Before downloading the data, we need to create an account at Nasdaq Data Link (https://data.nasdaq.com/) and then authenticate our email address (otherwise, an exception is likely to occur while downloading the data). We can find our personal API key in our profile (https://data.nasdaq.com/account/profile).

How to do it…

Execute the following steps to download data from Nasdaq Data Link:

  1. Import the libraries:
    import pandas as pd
    import nasdaqdatalink
    
  2. Authenticate using your personal API key:
    nasdaqdatalink.ApiConfig.api_key = "YOUR_KEY_HERE"
    

    You need to replace YOUR_KEY_HERE with your own API key.

  1. Download the data:
    df = nasdaqdatalink.get(dataset="WIKI/AAPL",
                            start_date="2011-01-01", 
                            end_date="2021-12-31")
    
  2. Inspect the downloaded data:
    print(f"Downloaded {len(df)} rows of data.")
    df.head()
    

    Running the code generates the following preview of the DataFrame:

Figure 1.2: Preview of the downloaded price information

The result of the request is a DataFrame (1,818 rows) containing the daily OHLC prices, the adjusted prices, dividends, and potential stock splits. As we mentioned in the introduction, the data is limited and is only available until April 2018—the last observation actually comes from March, 27 2018.

How it works…

The first step after importing the required libraries was authentication using the API key. When providing the dataset argument, we used the following structure: DATASET/TICKER.

We should keep the API keys secure and private, that is, not share them in public repositories, or anywhere else. One way to make sure that the key stays private is to create an environment variable (how to do it depends on your operating system) and then load it in Python. To do so, we can use the os module. To load the NASDAQ_KEY variable, we could use the following code: os.environ.get("NASDAQ_KEY").

Some additional details on the get function are:

  • We can specify multiple datasets at once using a list such as ["WIKI/AAPL", "WIKI/MSFT"].
  • The collapse argument can be used to define the frequency (available options are daily, weekly, monthly, quarterly, or annually).
  • The transform argument can be used to carry out some basic calculations on the data prior to downloading. For example, we could calculate row-on-row change (diff), row-on-row percentage change (rdiff), or cumulative sum (cumul) or scale the series to start at 100 (normalize). Naturally, we can easily do the very same operation using pandas.

There’s more...

Nasdaq Data Link distinguishes two types of API calls for downloading data. The get function we used before is classified as a time-series API call. We can also use the tables API call with the get_table function.

  1. Download the data for multiple tickers using the get_table function:
    COLUMNS = ["ticker", "date", "adj_close"]
    df = nasdaqdatalink.get_table("WIKI/PRICES", 
                                  ticker=["AAPL", "MSFT", "INTC"], 
                                  qopts={"columns": COLUMNS}, 
                                  date={"gte": "2011-01-01", 
                                        "lte": "2021-12-31"}, 
                                  paginate=True)
    df.head()
    
  2. Running the code generates the following preview of the DataFrame:

    Figure 1.3: Preview of the downloaded price data

    This function call is a bit more complex than the one we did with the get function. We first specified the table we want to use. Then, we provided a list of tickers. As the next step, we specified which columns of the table we were interested in. We also provided the range of dates, where gte stands for greater than or equal to, while lte is less than or equal to. Lastly, we also indicated we wanted to use pagination. The tables API is limited to 10,000 rows per call. However, by using paginate=True in the function call we extend the limit to 1,000,000 rows.

  1. Pivot the data from long format to wide:
    df = df.set_index("date")
    df_wide = df.pivot(columns="ticker")
    df_wide.head()
    

    Running the code generates the following preview of the DataFrame:

Figure 1.4: Preview of the pivoted DataFrame

The output of the get_tables function is in the long format. However, to make our analyses easier, we might be interested in the wide format. To reshape the data, we first set the date column as an index and then used the pivot method of a pd.DataFrame.

Please bear in mind that this is not the only way to do so, and pandas contains at least a few helpful methods/functions that can be used for reshaping the data from long to wide and vice versa.

See also

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime