So what's in it for Apache Spark here? Let's assume we have a set of powerful nodes in our local data center. What is the advantage of using Kubernetes for deployment over just installing Apache Spark on bare metal? Let's take the question the other way round. Let's have a look at the disadvantages of using Kubernetes in this scenario. Actually, there is no disadvantage at all.
The link http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf is a 2014 paper of IBM Research, stating that performance within a Docker container is nearly identical to bare metal.
So this means that the only disadvantage is the effort you invest in installing and maintaining Kubernetes. But what you gain are the following:
- Easy installation and updates of Apache...