Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python for Finance

You're reading from   Python for Finance Apply powerful finance models and quantitative analysis with Python

Arrow left icon
Product type Paperback
Published in Jun 2017
Publisher
ISBN-13 9781787125698
Length 586 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Yuxing Yan Yuxing Yan
Author Profile Icon Yuxing Yan
Yuxing Yan
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Python Basics FREE CHAPTER 2. Introduction to Python Modules 3. Time Value of Money 4. Sources of Data 5. Bond and Stock Valuation 6. Capital Asset Pricing Model 7. Multifactor Models and Performance Measures 8. Time-Series Analysis 9. Portfolio Theory 10. Options and Futures 11. Value at Risk 12. Monte Carlo Simulation 13. Credit Risk Analysis 14. Exotic Options 15. Volatility, Implied Volatility, ARCH, and GARCH Index

Data manipulation

There are many different types of data, such as integer, real number, or string. The following table offers a list of those data types:

Data types

Description

Bool

Boolean (TRUE or FALSE) stored as a byte

Int

Platform integer (normally either int32 or int64)

int8

Byte (-128 to 127)

int16

Integer (-32768 to 32767)

int32

Integer (-2147483648 to 2147483647)

int64

Integer (9223372036854775808 to 9223372036854775807)

unit8

Unsigned integer (0 to 255)

unit16

Unsigned integer (0 to 65535)

unit32

Unsigned integer (0 to 4294967295)

unit64

Unsigned integer (0 to 18446744073709551615)

float

Short and for float6

float32

Single precision float: sign bit23 bits mantissa; 8 bits exponent

float64

52 bits mantissa

complex

Shorthand for complex128

complex64

Complex number; represented by two 32-bit floats (real and imaginary components)

complex128

Complex number; represented by two 64-bit floats (real and imaginary components)

Table 1.1 List of different data types

In the following examples, we assign a value to r, which is a scalar, and several values to pv, which is an array (vector).The type() function is used to show their types:

>>> import numpy as np
>>> r=0.023
>>>pv=np.array([100,300,500])
>>>type(r)
<class'float'>
>>>type(pv)
<class'numpy.ndarray'>

To choose the appropriate decision, we use the round()function; see the following example:

>>> 7/3
2.3333333333333335
>>>round(7/3,5)
2.33333
>>>

For data manipulation, let's look at some simple operations:

>>>import numpy as np
>>>a=np.zeros(10)                      # array with 10 zeros 
>>>b=np.zeros((3,2),dtype=float)       # 3 by 2 with zeros 
>>>c=np.ones((4,3),float)              # 4 by 3 with all ones 
>>>d=np.array(range(10),float)         # 0,1, 2,3 .. up to 9 
>>>e1=np.identity(4)                   # identity 4 by 4 matrix 
>>>e2=np.eye(4)                        # same as above 
>>>e3=np.eye(4,k=1)                    # 1 start from k 
>>>f=np.arange(1,20,3,float)           # from 1 to 19 interval 3 
>>>g=np.array([[2,2,2],[3,3,3]])       # 2 by 3 
>>>h=np.zeros_like(g)                  # all zeros 
>>>i=np.ones_like(g)                   # all ones

Some so-called dot functions are quite handy and useful:

>>> import numpy as np
>>> x=np.array([10,20,30])
>>>x.sum()
60

Anything after the number sign of # will be a comment. Arrays are another important data type:

>>>import numpy as np
>>>x=np.array([[1,2],[5,6],[7,9]])      # a 3 by 2 array
>>>y=x.flatten()
>>>x2=np.reshape(y,[2,3]              ) # a 2 by 3 array

We could assign a string to a variable:

>>> t="This is great"
>>>t.upper()
'THIS IS GREAT'
>>>

To find out all string-related functions, we use dir(''); see the following code:

>>>dir('')
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>>

For example, from the preceding list we see a function called split. After typinghelp(''.split), we will have related help information:

>>>help(''.split)
Help on built-in function split:

split(...) method of builtins.str instance
S.split(sep=None, maxsplit=-1) -> list of strings

    Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
>>>

We could try the following example:

>>> x="this is great"
>>>x.split()
['this', 'is', 'great']
>>>

Matrix manipulation is important when we deal with various matrices:

Data manipulation

The condition for equation (3) is that matrices A and B should have the same dimensions. For the product of two matrices, we have the following equation:

Data manipulation

Here,A is an n by k matrix (n rows and k columns), while B is a k by m matrix. Remember that the second dimension of the first matrix should be the same as the first dimension of the second matrix. In this case, it is k. If we assume that the individual data items in C, A, and B are Ci,j (the ith row and the jth column), Ai,j, and Bi,j, we have the following relationship between them:

Data manipulation

The dot() function from the NumPy module could be used to carry the preceding matrix multiplication:

>>>a=np.array([[1,2,3],[4,5,6]],float)    # 2 by 3
>>>b=np.array([[1,2],[3,3],[4,5]],float)  # 3 by 2
>>>np.dot(a,b)                            # 2 by 2
>>>print(np.dot(a,b))
array([[ 19.,  23.],
[ 43.,  53.]])
>>>

We could manually calculate c(1,1): 1*1 + 2*3 + 3*4=19.

After retrieving data or downloading data from the internet, we need to process it. Such a skill to process various types of raw data is vital to finance students and to professionals working in the finance industry. Here we will see how to download price data and then estimate returns.

Assume that we have n values of x1, x2, … and xn. There exist two types of means: arithmetic mean and geometric mean; see their genetic definitions here:

Data manipulation
Data manipulation

Assume that there exist three values of 2,3, and 4. Their arithmetic and geometric means are calculated here:

>>>(2+3+4)/3.
>>>3.0
>>>geo_mean=(2*3*4)**(1./3)
>>>round(geo_mean,4) 
2.8845

For returns, the arithmetic mean's definition remains the same, while the geometric mean of returns is defined differently; see the following equations:

Data manipulation
Data manipulation

In Chapter 3, Time Value of Money, we will discuss both means again.

We could say that NumPy is a basic module while SciPy is a more advanced one. NumPy tries to retain all features supported by either of its predecessors, while most new features belong in SciPy rather than NumPy. On the other hand, NumPy and SciPy have many overlapping features in terms of functions for finance. For those two types of definitions, see the following example:

>>> import scipy as sp
>>> ret=sp.array([0.1,0.05,-0.02])
>>>sp.mean(ret)
0.043333333333333342
>>>pow(sp.prod(ret+1),1./len(ret))-1 
0.042163887067679262

Our second example is related to processing theFama-French 3 factor time series. Since this example is more complex than the previous one, if a user feels it is difficult to understand, he/she could simply skip this example. First, a ZIP file called F-F_Research_Data_Factor_TXT.zip could be downloaded from Prof. French's Data Library. After unzipping and removing the first few lines and annual datasets, we will have a monthly Fama-French factor time series. The first few lines and last few lines are shown here:

DATE    MKT_RFSMBHMLRF
192607    2.96   -2.30   -2.87    0.22
192608    2.64   -1.40    4.19    0.25
192609    0.36   -1.32    0.01    0.23

201607    3.95    2.90   -0.98    0.02
201608    0.49    0.94    3.18    0.02
201609    0.25    2.00   -1.34    0.02

Assume that the final file is called ffMonthly.txt under c:/temp/. The following program is used to retrieve and process the data:

import numpy as np
import pandas as pd
file=open("c:/temp/ffMonthly.txt","r")
data=file.readlines()
f=[]
index=[]
for i in range(1,np.size(data)):
    t=data[i].split()
    index.append(int(t[0]))
    for j in range(1,5):
        k=float(t[j])
        f.append(k/100)
n=len(f) 
f1=np.reshape(f,[n/4,4])
ff=pd.DataFrame(f1,index=index,columns=['Mkt_Rf','SMB','HML','Rf'])

To view the first and last few observations for the dataset called ff, the functions of .head() and .tail()can be used:

Data manipulation
You have been reading a chapter from
Python for Finance - Second Edition
Published in: Jun 2017
Publisher:
ISBN-13: 9781787125698
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime