Apache Spark in the cloud is the perfect solution for data scientists and data engineers who want to concentrate on getting the actual work done without being concerned about the operation of an Apache Spark cluster.
We saw that Apache Spark in the cloud is much more than just installing Apache Spark on a couple of virtual machines. It comes as a whole package for the data scientist, completely based on open-source components, which makes it easy to migrate to other cloud providers or to local datacenters if necessary.
We also learned that, in a typical data science project, the variety of skills is huge, which is taken care of by supporting all common programming languages and open-source data analytics frameworks on top of Apache Spark and Jupyter notebooks, and by completely eliminating the necessity for operational skills required to maintain the Apache Spark cluster...