Gathering raw data into the data warehouse
ZenML orchestrates the data collection pipeline. Thus, leveraging ZenML, the data collection pipeline can be run manually, scheduled, or triggered by specific events. Here, we will show you how to run it manually, while we will discuss the other scenarios in Chapter 11 when digging deeper into MLOps.
We configured a different pipeline run for each author. We provided a ZenML configuration file for Paul Iusztin’s or Maxime Labonne’s data. To call the data collection pipeline to collect Maxime’s data, for example, you can run the following CLI command:
poetry poe run-digital-data-etl-maxime
That will call the pipeline with the following ZenML YAML configuration file:
parameters:
user_full_name: Maxime Labonne # [First Name(s)] [Last Name]
links:
# Personal Blog
- https://mlabonne.github.io/blog/posts/2024-07-29_Finetune_Llama31.html
- https://mlabonne.github.io/blog/posts/2024-07-15_The_Rise_of_Agentic_Data_Generation...