Summary
We saw that containerization is the new way to go and that managing an abundance of containers requires orchestrations. We learned that Kubernetes is such an orchestrator and we saw how it can be used to create a simple Apache Spark cluster within minutes. We used a local test installation for playing around with Kubernetes, but the example shown can be used out-of-the-box on any Kubernetes installation.
We also saw that using Kubernetes as a service in the cloud can be very beneficial when used in conjunction with Apache Spark, since in the cloud the underlying Docker containers are charged on an hourly basis, therefore making it possible to elastically grow and shrink the Apache Spark cluster on the fly, as needed.
Finally, using Kubernetes in the cloud and in the local data center as well allows a broader usage scenario in a so-called hybrid cloud approach. In such an approach, depending on constraints such as data protection laws and resource requirements, an Apache Spark cluster...