Technical requirements
Once again, we’ll have lots of technical examples in this chapter. So, you’re going to need the following:
- An internet-connected computer.
- A single or multi-node Apache Spark cluster also running Jupyter. Docker is the easiest way to set it up. We’ll cover doing that in this chapter.
- Python 3.8+ with the
pyarrow
module and thedataset
submodule installed. - Your preferred coding IDE, such as Emacs, Vim, Sublime, or VS Code.