Managing the distributed data pipeline
The preceding data pipeline runs on each node. To compensate for that, you had to create the same path on both nodes for the PutFile
processor to work. Earlier, you learned that there are several processors that can result in race conditions – trying to read the same file at the same time – which will cause problems. To resolve these issues, you can specify that a processor should only run on the Primary Node – as an isolated process.
In the configuration for the PutFile
processor, select the Scheduling tab. In the dropdown for Scheduling Strategy, choose On primary node, as shown in the following screenshot:
Now, when you run the data pipeline, the files will only be placed on the Primary Node. You can schedule processors such as GetFile
or ExecuteSQL
to do the same thing.
To see the load of the data pipeline on each node, you...