Nowadays, there's a big chance that the training and test data are hosted in some cloud storage system. In this section, we are going to learn how to ingest data through Apache Spark from an object storage such as Amazon S3 (https://aws.amazon.com/s3/) or S3-based (such as Minio, https://www.minio.io/). The Amazon simple storage service (which is more popularly known as Amazon S3) is an object storage service part of the AWS cloud offering. While S3 is available in the public cloud, Minio is a high performance distributed object storage server compatible with the S3 protocol and standards that has been designed for large-scale private cloud infrastructures.
We need to add to the Scala project the Spark core and Spark SQL dependencies, and also the following:
groupId: com.amazonaws
artifactId: aws-java-sdk-core
version1.11.234
groupId: com.amazonaws...