When you look at the requirements for an unstructured data store, it seems that Hadoop is a perfect choice because it is scalable, extensible, and very flexible. It can run on consumer hardware, has a vast ecosystem of tools, and appears to be cost effective to run. Hadoop uses a master-and-child-node model, where data is distributed between multiple child nodes and the master node co-ordinates jobs for running queries on data. The Hadoop system is based on massively parallel processing (MPP), which makes it fast to perform queries on all types of data, whether it is structured or unstructured.
When a Hadoop cluster is created, each child node created from the server comes with a block of the attached disk storage called a local Hadoop Distributed File System (HDFS) disk store. You can run the query against stored data using common processing frameworks such as Hive, Ping, and Spark. However, data on the local disk persists only for the life of the associated...