Implementation and deployment
Implementation depends on setting up the big data infrastructure. Please verify that your MongoDB installation is running properly. Now we shall list implementation objectives as follows:
- Splitting data into test, train and validation datasets
- Data ingestion
- Data analysis
Implementation objectives
The overall objective is to perform data analysis on an on-time flight dataset corresponding to the year 2007-2008. Of the 2007 flight data, 80% will be used as the training dataset and the rest as a validation dataset. In so far as model performance evaluation is concerned, 100% of the 2008 flight data becomes the testing dataset.
The following are the implementation objectives required to implement the flight prediction model:
- Download the flight dataset.
- You may develop the pipeline in four ways:
- Incrementally in your local Spark shell
- By firing up your Horton Sandbox on your host machine managed virtual machine, and developing code in a powerful Zeppelin Notebook environment...