Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Apache Spark 2.3 now has native Kubernetes support!

Save for later
  • 2 min read
  • 07 Mar 2018

article-image
Two of the leading open-source projects, Apache Spark and Kubernetes now collaborate: Apache Spark 2.3 has native Kubernetes support.

Kubernetes : A natural fit for Apache Spark

Apache Spark is a framework for large-scale data processing and an important tool for data scientists. It offers a robust platform to carry out major tasks; be it data transformation, analytics, or machine learning. Recently, data scientists have been embracing the concept of working with containers in order to improve their workflows. Benefits such as packaging of dependencies and creating reproducible artifacts can be leveraged by the container adoption.

This is where Kubernetes, an open-source system for automating deployment, to scale and manage containerized environments, comes to the rescue. It enables one to run containerized applications within Spark. This combination of Apache Spark and Kubernetes has dual benefits. Firstly, data scientists get to use their principal tool i.e., Apache Spark’s ability to manage distributed data processing tasks and secondly, they can work with containers using Kubernetes API.

With Apache Spark 2.3, users can run Spark workloads in an existing Kubernetes 1.7+ cluster. This means Apache Spark workloads can make direct use of Kubernetes clusters for multi-tenancy and sharing through Namespaces and Quotas. It can also make use of administrative features such as Pluggable Authorization and Logging. Also, Spark workloads require no changes or new installations on the Kubernetes cluster. One simply has to create a container image and set up the right RBAC roles for the Spark Application and it is ready.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

The native Kubernetes support offers a fine-grained management of Spark Applications along with improved elasticity, and seamless integration with logging and monitoring solutions. The community is also exploring advanced use cases such as managing streaming workloads and leveraging service meshes like Istio.

Visit Databricks blog to read more on this topic.