Extracting data is all about getting and combining data from different sources, before transforming it in different ways. PDI offers connectivity to a big list of data sources, including all kinds of databases, both commercial and open source. It can also connect to a wide variety of files, both structured and unstructured. The list includes CSV files, properties files, fixed-width text files, and proprietary formats. In particular, this chapter will explain how to get data from plain files and relational databases.
The following topics will be covered in this chapter:
- Getting data from plain files
- Getting data from relational databases
- Getting data from other sources
- Combining different sources into a single dataset