Executing parallel jobs using Oozie (fork)
In this recipe, we are going to take a look at how to execute parallel jobs using the Oozie fork node. Here, we will be executing one Hive and one Pig job in parallel.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Oozie, Hive, and Pig installed on it.
How to do it...
For parallel execution, we need to use the fork node given by Oozie. The following is a sample workflow that executes Hive and Pig jobs in parallel:
<workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf"> <start to="fork-node"/> <fork name="fork-node"> <path start="pig-node"/> <path start="hive-node"/> </fork> <action name="pig-node"> <pig> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/> </prepare> <configuration...