Differentiating ETL and ELT
You don’t always have to collect your data manually. Some programs will automatically pull data from a source, prepare it for use, and move it to a new location, usually your local environment. The code to automate this process is called a data pipeline. There are many kinds of data pipelines, and each will need to be tuned to which data you are pulling and what you need to do with it. While more complicated pipelines can automate entire modeling and reporting processes, the exam focuses on two types, and both types have the same three steps:
- Extraction
- Transformation
- Loading
Extraction is the step of pulling the data from the original source. The source can be a database you own, an outside database, or even an automated web scraping system—it doesn’t matter. Extraction is picking up the information from wherever it was originally stored. This is similar to the process you would do to manually collect it yourself...