Using Spark with MongoDB (NoSQL database)
In this section, we will use with one of the most popular NoSQL databases - MongoDB. MongoDB is a distributed document database that stores data in JSON-like format. Unlike the rigid schemas in relational databases, the data structure in is a lot more flexible and the stored documents can have arbitrary fields. This flexibility combined with high availability and scalability features make it a good choice for data in many applications. It is also free and open-source software.
Note
If you do not have MongoDB already installed and available, then you can download it from https://www.mongodb.org/downloads. Follow the installation instructions for your specific OS to install the database.
The New York City schools directory dataset for this example has been taken from the New York City Open Data website and can be downloaded from https://nycplatform.socrata.com/data?browseSearch=&scope=&agency=&cat=education&type=datasets.
After you...