Summary
Loading and saving are the most used operations when wrangling data. Optimus creates a flow that can assist in creating connections to data sources that can be reused for loading and saving data. Optimus also implements the most used file storage technologies such as Amazon S3 and Google Cloud Storage, and database connections such as PostgreSQL and MySQL, so that the user can have all the necessary tools at hand to make their work easier.
In terms of databases, we looked at the drivers that are required for every engine/database technology to save and load data from databases.
We also explored how to optimize dataframe memory usage – a very important step if you are handling big data since you could save as much as 50% of your memory space.
In the next chapter, we will start exploring some basic methods for filtering, deduplicating, and transforming data for further analysis.