Building an Extract, Transform, Machine Learning Use Case
Similar to Chapter 8, Building an Example ML Microservice, the aim of this chapter will be to try to crystallize a lot of the tools and techniques we have learned about throughout this book and apply them to a realistic scenario. This will be based on another use case introduced in Chapter 1, Introduction to ML Engineering, where we imagined the need to cluster taxi ride data on a scheduled basis. So that we can explore some of the other concepts introduced throughout the book, we will assume as well that for each taxi ride, there is also a series of textual data from a range of sources, such as traffic news sites and transcripts of calls between the taxi driver and the base, joined to the core ride information. We will then pass this data to a Large Language Model (LLM) for summarization. The result of this summarization can then be saved in the target data location alongside the basic ride date to provide important context...