By default, most of the dataset methods are processing samples one by one, with no parallelism. However, this behavior can be easily changed, for example, to take advantage of multiple CPU cores. For instance, the .interleave() and .map() methods both have a num_parallel_calls parameter to specify the number of threads they can create (refer to the documentation at https://www.tensorflow.org/api_docs/python/tf/data/Dataset). Parallelizing the extraction and transformation of images can greatly decrease the time needed to generate training batches, so it is important to always properly set num_parallel_calls (for instance, to the number of CPU cores the processing machine has).
TensorFlow also provides tf.data.experimental.parallel_interleave() (refer to the documentation at https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/experimental/parallel_interleave), a parallelized version of .interleave() with some additional options. For instance...