Introduction
While flat files and databases are the most common type of source that developers using Kettle interact with, there are many other types of data sources that are capable of being used. Data warehouses are now starting to leverage the capabilities of tools such as Hadoop, NoSQL databases, and cloud services such as Amazon Web Services and SalesForce.
In this chapter, you will learn to interact with these Big Data sources in Kettle. The recipes in this chapter are grouped into various data sources, with each grouping covering how to connect, read data from, and load data into the given data source.
The focus of this chapter is on data sources that are usually larger than can be set up for working through exercises. For each data source, the recipe Connecting to a database in Chapter 1, Working with Databases, will cover how to connect to the given data source, as well as recommendations for setting up a test environment in which to work in.