Understanding Spark-based application architectures
Apache Spark is an emerging platform that leverages distributed storage and processing frameworks to support querying, reporting, analytics, and intelligent applications at scale. Spark SQL has the necessary features, and supports the key mechanisms required, to access data across a set of data sources and formats, and prepare it for downstream applications either with low-latency streaming data or high-throughput historical data stores. The following figure shows a high-level architecture that incorporates these requirements in typical Spark-based batch and streaming applications:
Additionally, as organizations start employing big data and NoSQL-based solutions across a number of projects, a data layer comprising RDBMSes alone is no longer considered the best fit for all the use-cases in a modern enterprise application. RDBMS-only based architectures illustrated in the following figure are rapidly disappearing across the industry, in order...