The architecture of Spark SQL
In this section, we will discuss the overall design, architecture, and various components of Spark SQL. This will help us to understand the varied features and capabilities of Spark SQL.
The emergence of Spark SQL
Storing data in relational structures such as Relational Database Management Systems (RDBMS) (such as Oracle, MySQL, and others) and leveraging SQL is a well-known and industry-wide standard for performing analysis over the data collected from various sources such as online portals, surveys, and so on.
It worked fine but only till the time when the data was limited and reasonable in size, that is, not more than a few GBs. As soon as it grew to TBs, it started giving nightmares where SQL queries would take hours, sometimes they would not even complete, and many a times crashed the whole system itself.
That's where Apache Hadoop (https://en.wikipedia.org/wiki/Apache_Hadoop) was introduced as a distributed, scalable, fault tolerant, parallel, and batch processing...