An overview of Spark
In this section, we will talk about Spark and its emergence as one of the leading frameworks for various kinds of Big Data use cases. We will also talk about the various features of Spark and its applicability in different scenarios.
Another distributed framework for crunching large data? Another version of Hadoop?
This is the first statement that comes to mind when we hear about Spark for the first time, but this is not true and neither is there any essence. We will soon talk more about this statement, but before that, let's first understand batch processing and real-time data processing.
Batch data processing
Batch data processing is a process of defining a series of jobs that are executed one after another or in parallel in order to achieve a common goal. Mostly, these jobs are automated and there is no manual intervention. These jobs collect the input data and process the data in batches where the size of each batch can vary. It can range from a few GBs to TBs/PBs. These...