Going deeper – Hadoop for finance
Now that we know how to use Hadoop to perform a simple word count on a fairly large text file, we can take a step further and use Hadoop for quantitative analysis. For a start, we can count the number of historical intraday percentage price changes of a stock.
Obtaining IBM stock prices from Yahoo! Finance
To obtain a dataset, we can use the historical stock prices available from Yahoo! Finance. Using Firefox or any web browser in your CentOS environment, you can download the historical daily prices for a stock counter as a CSV file using the following link
http://ichart.finance.yahoo.com/table.csv?s=IBM
In this example, we will use IBM as our example stock. Download the file to the Downloads
folder of your home directory and rename it as ibm.csv
. If we take a look at the contents of the CSV file, the daily stock prices go all the way back to 1962.
Then run the following command in the Terminal to copy our target CSV file to the Hadoop HDFS file store:
[cloudera...