Summary
Throughout this chapter, we focused on the components of designing and optimizing data write operations. We discussed how to choose the right data sink technology, how file formats significantly impact storage efficiency and query performance, and why it matters to choose the right one for your use case. Finally, we discussed why data partitioning is crucial for optimizing query performance and resource utilization.
In the next chapter, we will start transforming the data that’s been written on the data sink to better prepare it for downstream analytics by detecting and handling outliers and missing values.