Engineering the solution
We will engineer the solution by breaking down the problem into several parts. In each part, we will perform a step to import or transform the data. Finally, we will bring everything together to create the view. To engineer the solution, we will use Sqoop to load customer master data from MySql RDBMS into Hive. We will use HDFS copy commands to load the Apache Access logs and tweets in Hadoop.
In the 360-degree view of the customer, we will combine the information from the following sources:
- Full name, gender, userID, and e-mail from customer master data as the data from the system of records
- Brand names frequently visited on Cosmetica's web shop as the data from web logs
- Tweets on certain topics as the social media data
You should bear in mind that we have taken a small set of data sources to create the 360-degree view. In practice, you should think of several data sources that can be used to build...