Data collection, processing, and cleaning
In this stage, you will begin with gathering raw data from the identified sources. You will write data pipelines to prepare and clean the raw data for analysis.
Understanding data sources, location, and the format
You have started working with the SME to access a subset of the flight data. You will understand the data format and the integration process required to access this data. The data could be in CSV format, or it may be available in some relational database management system (RDBMS). It is vital to understand how this data would be available for your project and how this data is being maintained eventually.
Start this process by identifying what data is easily available. The SME has mentioned that the flight records data that covered the flight information, the scheduled and actual departure times, and the scheduled and actual arrival times is readily available. This information is available in the object store of your organization...