There are many different types of data, such as integer, real number, or string. The following table offers a list of those data types:
Table 1.1 List of different data types
In the following examples, we assign a value to r
, which is a scalar, and several values to pv
, which is an array (vector).The type()
function is used to show their types:
To choose the appropriate decision, we use the round()
function; see the following example:
For data manipulation, let's look at some simple operations:
Some so-called dot
functions are quite handy and useful:
Anything after the number sign of #
will be a comment. Arrays are another important data type:
We could assign a string to a variable:
To find out all string-related functions, we use dir('')
; see the following code:
>>>dir('')
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>>
For example, from the preceding list we see a function called split
. After typinghelp(''.split)
, we will have related help information:
We could try the following example:
Matrix manipulation is important when we deal with various matrices:
The condition for equation (3) is that matrices A and B should have the same dimensions. For the product of two matrices, we have the following equation:
Here,A is an n by k matrix (n rows and k columns), while B is a k by m matrix. Remember that the second dimension of the first matrix should be the same as the first dimension of the second matrix. In this case, it is k. If we assume that the individual data items in C, A, and B are Ci,j (the ith row and the jth column), Ai,j, and Bi,j, we have the following relationship between them:
The dot()
function from the NumPy module could be used to carry the preceding matrix multiplication:
We could manually calculate c(1,1): 1*1 + 2*3 + 3*4=19.
After retrieving data or downloading data from the internet, we need to process it. Such a skill to process various types of raw data is vital to finance students and to professionals working in the finance industry. Here we will see how to download price data and then estimate returns.
Assume that we have n values of x1, x2, … and xn. There exist two types of means: arithmetic mean and geometric mean; see their genetic definitions here:
Assume that there exist three values of 2
,3
, and 4
. Their arithmetic and geometric means are calculated here:
For returns, the arithmetic mean's definition remains the same, while the geometric mean of returns is defined differently; see the following equations:
In Chapter 3, Time Value of Money, we will discuss both means again.
We could say that NumPy is a basic module while SciPy is a more advanced one. NumPy tries to retain all features supported by either of its predecessors, while most new features belong in SciPy rather than NumPy. On the other hand, NumPy and SciPy have many overlapping features in terms of functions for finance. For those two types of definitions, see the following example:
Our second example is related to processing theFama-French 3 factor time series. Since this example is more complex than the previous one, if a user feels it is difficult to understand, he/she could simply skip this example. First, a ZIP file called F-F_Research_Data_Factor_TXT.zip
could be downloaded from Prof. French's Data Library. After unzipping and removing the first few lines and annual datasets, we will have a monthly Fama-French factor time series. The first few lines and last few lines are shown here:
Assume that the final file is called ffMonthly.txt
under c:/temp/
. The following program is used to retrieve and process the data:
To view the first and last few observations for the dataset called ff
, the functions of .head()
and .tail()
can be used: