Summary
In this chapter, executing and scheduling jobs were presented. As our data processing workflows can depend on a number of applications, we showed how to chain together and coordinate the execution of a number of tasks. Then, we proceeded with configuring our jobs using both property files and Hadoop parameters.
Monitoring and optimizing our job execution was also presented. Finally, two more techniques we presented about using slim jars were to optimize the deployment process and how to throttle job execution.
In the next chapter, we will see how to use external data sources to read data from and store data to.