Summary
In this chapter, we discussed the basic architecture principles to build and manage a data platform. We looked at data lakes that can hold vast amounts of raw data and how we can build these lakes on top of cloud storage. The next step is to fetch the right data that is usable in data models. We must extract, transfer and load – ETL or ELT for short - the accurate data sets in environments where data analysts can work with this data. Typically, data warehouses are used for this.
We studied the various propositions for data operations of the major cloud providers AWS, Azure, Google Cloud, Alibaba, and Oracle. Next, we discussed the challenges that come with building and operating data platforms. There will be challenges with respect to access to data, accuracy, but also privacy and compliancy. Data gravity is another problem that we must solve. It’s not easy to move huge amounts of data across platform, hence we must find other solutions to work with data in different...