Training and testing DL models requires data. Data is usually hosted on different distributed and remote storage systems. You need them to connect to the data sources and perform data retrieval so that you can start the training phase and you would probably need to do some preparation before feeding your model. This chapter goes through the phases of the Extract, Transform, Load (ETL) process applied to DL. It covers several use cases for which the DeepLearning4j framework and Spark would be used. The use cases presented here are related to batch data ingestion. Data streaming will be covered in the next chapter.
The following topics will be covered in this chapter:
- Training data ingestion through Spark
- Data ingestion from a relational database
- Data ingestion from a NoSQL database
- Data ingestion from S3