Not all use cases require Hadoop, and when used in a use case that doesn't require Hadoop, it can be a maintenance havoc.
Hadoop should not be used if you need the following things:
- To do graph-based data processing. You might have to bring another Hadoop ecosystem product (say, Apache Tez) to do this.
- To process real-time data processing. However, using many products in Hadoop ecosystem, this can also be done, but it has to be analysed and then decided. Apache Flink or Spark on top of HDFS can be an option that can be considered.
- To process data stored in relational databases. Using Hive over HDFS can be an option though which could be considered.
- Access to shared state for processing data. Hadoop works by splitting data across multiple nodes in a cluster and tends to do jobs in parallel fashion, which is stateless in nature.
- To process small datasets.