Why Hadoop plus Spark?
Apache Spark shines better when it is combined with Hadoop. To understand this, let's take a look at Hadoop and Spark features.
Hadoop features
Feature |
Details |
---|---|
Unlimited scalability |
Stores unlimited data by scaling out HDFS Effectively manages cluster resources with YARN Runs multiple applications along with Spark Thousands of simultaneous users |
Enterprise grade |
Provides security with Kerberos authentication and ACLs authorization Data encryption High reliability and integrity Multi-tenancy |
Wide range of applications |
Files: Structured, semi-structured, and unstructured Streaming sources: Flume and Kafka Databases: Any RDBMS and NoSQL database |
Spark features
Feature |
Details |
---|---|
Easy development |
No boilerplate coding Multiple native APIs such as Java, Scala, Python, and R REPL for Scala, Python, and R |
Optimized performance |
Caching Optimized shuffle Catalyst Optimizer |
Unification |
Batch, SQL, machine learning, streaming, and graph processing... |