Introducing Apache NiFi for dataflows
Apache NiFi automates dataflows by receiving data from any source, such as Twitter, Kafka, databases, and so on, and sends it to any data processing system, such as Hadoop or Spark, and then finally to data storage systems, such as HBase, Cassandra, and other databases. There can be multiple problems at these three layers, such as systems being down, or data production and consumption rates are not in sync. Apache NiFi addresses the dataflow challenges by providing the following key features:
Guaranteed delivery with write-ahead logs
Data buffering with Back Pressure and Pressure Release
Prioritized queuing with the oldest first, newest first, or largest first, and so on
Configurations for low latency, high throughput, loss tolerance, and so on
Data provenance records all data events for later discovery or debugging
Data is rolled off as it ages
Visual Command and Control provides dataflow visualizations and enables making changes to the existing dataflows...