Batch analytics
Batch Analytics in Apache Flink are quite similar to the streaming analytics in the way Flink handles both types of analytics using same APIs. This gives a lot of flexibility and allows code reuse across both the different types of analytics.
In this section, we will look at some analytical jobs on the sample data we are using OnlineRetail.csv
. We will also be loading cities.csv
and temperature.csv
to do some more join operations.
Reading file
Flink comes with several built-in formats to create data sets from common file formats. Many of them have shortcut methods on the execution environment.
File-based
File based sources can be read using APIs which are listed as follows:
readTextFile(path)
/TextInputFormat
: Reads files line wise and returns them as strings.readTextFileWithValue(path)
/TextValueInputFormat
: Reads files line wise and returns them asStringValues
.StringValues
are mutable strings.readCsvFile(path)
/CsvInputFormat
: Parses files of comma (or another char) delimited...