Chapter 6. Notebooks and Dataflows with Spark and Hadoop
There are many tools available for interactive analytics and to provide visualizations on Spark and Hadoop platforms. Some of the more important tools are the IPython Notebook (Jupyter), Spark Notebook, Ispark, Hue, Spark Kernel, Jove Notebook, Beaker Notebook, and Databricks Cloud. All of these notebooks are open source, except Databricks Cloud. This chapter is aimed at introducing and using some of the important interactive analytics tools using notebooks and a dataflow engine called NiFi. This chapter is divided into the following subtopics:
- Introducing web-based notebooks
- Introducing Jupyter
- Introducing Apache Zeppelin
- Using the Livy REST job server and Hue Notebooks
- Introducing Apache NiFi for dataflows