Connecting to the Spark platform and preparing the data
In this section, we'll show you how to start the Spark application in KNIME by connecting to the cluster and loading data into it. We'll also introduce you to the Spark nodes, which you'll need in the prediction task in the next section. We'll cover these topics in the following subsections:
- Introducing the Hadoop ecosystem
- Accessing the data and loading it into Spark
- Introducing the Spark compatible nodes
In the first subsection, we explain what the Hadoop ecosystem is and show how to access it from a KNIME workflow.
Introducing the Hadoop ecosystem
The Apache Hadoop ecosystem is an open source software framework that combines several computers into a computing cluster within which the processing tasks are split into smaller pieces and executed parallelly. Figure 12.1 illustrates the Hadoop framework:
Figure 12.1 – Illustrating the Hadoop software...