Chapter 2: Connecting to Data Sources
In most organizations, data tends to be stored in various data stores, such as filesystems, proprietary and open source databases, or even distributed filesystems for high-performance compute platforms. Often the data has meaning and is useful while being stored in the source systems, such as a transactional database that keeps track of sales from a group of point-of-sale systems. In this example, data is stored in a relational database that is tuned to keep track of each sale. For analytics purposes, we will likely want to use this data in concert with data from a separate system that tracks the inventory of items we have for sale. The inventory will likely be a different relational database, possibly from another technology vendor. To better understand whether we are stocking too many items (or not enough) for sale, we need to create a view of the data from both sales and inventory databases.
Over the past few decades, this has been the goal...