Streaming sources
Streaming sources are segregated into two categories in Spark Streaming, that is, basic source and advance source. All those sources that are directly available through StreamingContext
, such as filesystem and socket streams are called basic sources while sources that require dependency linkages, as in the case of Kafka, Flume, and so on are called advanced sources. Streaming sources can also be defined on the basis of reliability; if an acknowledgement is sent to the source system after receiving and replicating the messages then such receivers are called reliable receivers, such as the Kafka API. Similarly if the system does not send an acknowledgement to the source system then they are termed as unreliable.
Some common streaming sources apart from socket streaming, which were discussed in previous examples, are explained in the next section.
fileStream
Data files from any directory can be read from a directory using the fileStream()
API of StreamingContext
. The fileStream...