Limitations
Unfortunately, adding more machines to a cluster doesn't mean better performance. Yes, it can yield response time; however, the time for executing a particular request for a service will remain the same. That is because the request will be piped to a single machine (eventually), and that machine is responsible for fetching the required data from the database—be it a few records or thousands—and solely processing them. Distributed computing architectures, such as Hadoop, help utilize the power of parallel processing for all the machines by breaking up the data into parts and distributing them into cluster machines to be processed in parallel using the MapReduce concept. The power of Hadoop resides in the concept of data locality, where the database is accessed once and the result is fetched, divided into parts, and distributed to each machine for processing. Machines, in this case, do not need to query the database and this prevents networking latency. Instead, they work on the...