Here is a summary of the best practices to follow while building an efficient input data pipeline in TF 2.0:
- It's recommended to use a shuffling (shuffle) API before repeating the transformation.
- Use the prefetch transformation to overlap the work of a producer (fetching the next batch of data) and consumer (using the current batch of data for training). Also, it's extremely important to note that the prefetch transformation should be added to the end of your input pipeline after shuffling (shuffle), repeating (repeat), and batching (batch) the data pipeline. This should look something like this:
# buffer_size could be either 1 or 2 which represents 1 or 2 batches of data
dataset = dataset.shuffle(count).repeat().batch(batch_size).prefetch(buffer_size)
- It's strongly recommended to parallelize...