Summary
In this chapter, we were introduced to big data and its uses in finance. Big data tools provide the scalability and reliability of analyzing big data in the area of risk and credit analytics, handling data coming in from multiple sources. Apache Hadoop is one popular tool for financial institutions and enterprises in meeting these big data needs.
Apache Hadoop is an open source framework and written in Java. To help us get started quickly with Hadoop, we downloaded a QuickStart VM from Cloudera that comes with CentOS and Hadoop 2.0 running on VirtualBox. The main components in Hadoop are the HDFS file store, YARN, and MapReduce. We learned about Hadoop by writing a map and reduce program in Python to perform a word count on an e-book. Moving on, we downloaded a dataset of the daily prices of a stock and counted the number of percentage intraday price changes. The outputs were taken for further analysis.
Before we begin to manage big data, we will need an avenue to store this data....