Chapter 9. Working with Relational Databases
As we saw in the previous chapter, Hive is a great tool that provides a relational database-like view of the data stored in Hadoop. However, at the end of the day, it is not truly a relational database. It does not fully implement the SQL standard, and its performance and scale characteristics are vastly different (not better or worse, just different) from a traditional relational database.
In many cases, you will find a Hadoop cluster sitting alongside and used with (not instead of) relational databases. Often the business flows will require data to be moved from one store to the other; we will now explore such integration.
In this chapter, we will:
Identify some common Hadoop/RDBMS use cases
Explore how we can move data from RDBMS into HDFS and Hive
Use Sqoop as a better solution for such problems
Move data with exports from Hadoop into an RDBMS
Wrap up with a discussion of how this can be applied to AWS