Reading table files
The third type of data source you will find consists of the most common table files, such as Excel, CSV, TXT, XML, or even HTML. For these types of data sources, the one requirement would be that their content is in a readable, understandable structure. It will be easier to extract data from them if they are constructed in the form of a traditional table, that is, only rows and columns (like any table in a database). However, sometimes these files could contain extra information that is not actually part of the core table (such as headers or footers) and, therefore, additional transformations via script are required.
Note
In Chapter 9, Basic Data Transformation, we will talk about some techniques for dealing with unstructured table files.
The ability to read table files is especially useful when we want to mix information from the DBMS and data generated by the business user that might not be stored in a database. For instance, budget forecasts, external market indicators...