In this section, we will use Spark with one of the most popular NoSQL databases - MongoDB. MongoDB is a distributed document database that stores data in JSON-like format. Unlike the rigid schemas in relational databases, the data structure in MongoDB is a lot more flexible and the stored documents can have arbitrary fields. This flexibility combined with high availability and scalability features make it a good choice for storing data in many applications. It is also free and open-source software.Â
If you do not have MongoDB already installed and available, then you can download it from https://www.mongodb.org/downloads. Follow the installation instructions for your specific OS to install the database.
The New York City schools directory dataset for this example has been taken from the New York City Open Data website and can be downloaded...