Diving Deeper – Preparing and Processing Data for AI/ML Workloads on Google Cloud
In the previous chapter, we did some very rudimentary data exploration by looking at a few details relating to our dataset, using functions such as pandas.DataFrame.info()
and pandas.DataFrame.head()
. In this chapter, we will dive deeper into the realm of data exploration and preparation for data science workloads, as represented by the section highlighted in blue in the data science life-cycle diagram shown in Figure 6.1:
Figure 6.1: Data exploration and processing
In the early stages of a typical data science project, you would likely perform many of the data exploration and preparation steps in Jupyter notebooks, which, as we have seen, are useful for experimenting with small datasets. When you bring your workload into production, however, you are likely to use much larger datasets, in which case you would usually need to use different tools for processing your...