Chapter 7: Extending Apache Beam's I/O Connectors
In previous chapters, we focused on how to write data transformations after reading the data from data sources. There are two types of sources: bounded and unbounded. The difference between these is obvious – the size of the bounded type is limited (and this limitation is known in advance), while the size of the unbounded type is (possibly) infinite. A classic example of a bounded source is a file (or a set of immutable files), while an unbounded source is typically a streaming source such as Apache Kafka. Note that we can always convert an unbounded source to a bounded one by defining a bounding constraint. This could be, for example, the number of records that we want to read or the (processing or event time) duration for which we want to read the data.
In Apache Beam, these two types of sources historically resulted in two types of interfaces that are currently considered deprecated: the BoundedSource
and UnboundedSource...