The RecordReader class
Unlike InputSplit
, the RecordReader
class presents a record view of the data to the Map task. RecordReader
works within each InputSplit
class and generates records from the data in the form of key-value pairs. The InputSplit
boundary is a guideline for RecordReader
and is not enforced. On one extreme, a custom RecordReader
class can be written to read an entire file (though this is not encouraged). Most often, a RecordReader
class will have to read from a subsequent InputSplit
class to present the complete record to the Map task. This happens when records overlap InputSplit classes.
The reading of bytes from a subsequent InputSplit
class happens via the FSDataInputS
tream
objects. Though this reading does not respect locality in itself, generally, it gathers only a few bytes from the next split and there is not a significant performance overhead. But in some cases where record sizes are huge, this can have a bearing on the performance due to significant byte transfers...