Workings of Sqoop
For your data lake, you will definitely have to ingest data from traditional applications and data sources. The ingested data, being big, will definitely have to fall into the Hadoop store. Apache Sqoop is one technology that allows you to ingest data from these traditional enterprise data stores into Hadoop with ease.
SQL to Hadoop == SQOOP
The figure below (Figure 03) shows the basic workings of Apache Sqoop. It gives tools to export data from RDBMS to the Hadoop filesystem. It also gives tools to import data from a Hadoop filesystem back to RDBMS.
Figure 03: Basic workings of Sqoop
In our use case, we will be exporting the data stored in RDBMS (PostgreSQL) to the Hadoop File System (HDFS). We will not be looking at Sqoop's import capability in detail, but we will briefly cover that aspect also in this chapter so that you have pretty good knowledge of the different capabilities of this great tool.
As of writing this book, Sqoop has two variations (flavours) called by its major...