Chapter 11. Packaging Spark Applications
So far we have been working with a very convenient way of developing code in Spark - the Jupyter notebooks. Such an approach is great when you want to develop a proof of concept and document what you do along the way.
However, Jupyter notebooks will not work if you need to schedule a job, so it runs every hour. Also, it is fairly hard to package your application as it is not easy to split your script into logical chunks with well-defined APIs - everything sits in a single notebook.
In this chapter, we will learn how to write your scripts in a reusable form of modules and submit jobs to Spark programmatically.
Before you begin, however, you might want to check out the Bonus Chapter 2, Free Spark Cloud Offering where we provide instructions on how to subscribe and use either Databricks' Community Edition or Microsoft's HDInsight Spark offerings; the instructions on how to do so can be found here: https://www.packtpub.com/sites/default...