Transforming the data with Hive
The data is now in the cluster in HDFS. We'll now transform it using a SQL script. The program we're using is Hive. This program interacts with the data using SQL statements.
With most Hadoop programs (Hive, Pig, Sparks, and so on), source is read-only. It means that we cannot modify the data in the file that we transferred in the previous recipe. Some languages such as HBase allow us to modify the source data though. But for our purpose, we'll use Hive, a well-known program in the Hadoop ecosystem.
Getting ready
This recipe assumes that you have access to a Hortonworks cluster and that you have transferred data to it following the previous recipe.
How to do it...
- If not already done, open the package created in the previous recipe,
FactOrdersToHDPCuster.dtsx
. - Add a Hadoop Hive task and rename it
hht_HDPDWHiveTable
. - Double-click on it to open the
Hadoop Hive Task Editor
, as shown in the following screenshot:
Update the following parameters:
HadoopConnection
:
cmgr_Hadoop_Sandbox...