The TensorFlow tf.data API is a highly efficient data pipeline that processes data an order of magnitude faster than the Keras data input process. It aggregates data in a distributed filesystem and batch processes it. For further details, refer to: https://www.tensorflow.org/guide/data.
The following screenshot shows an image upload time comparison of tf.data versus the Keras image input process:
Note that 1,000 images take about 1.58 seconds, which is about 90 times faster than the Keras image input process.
Here is some common features for tf.data:
- For this API to work, you need to import the pathlib library.
- tf.data.Dataset.list_files is used to create a dataset of all files matching a pattern.
- tf.strings.splot splits the file path based on a delimiter.
- tf.image.decode_jpeg decodes a JPEG image into a tensor (note...