Generating a synthetic dataset for deep learning experiments
Synthetic data generation is the process of programmatically generating artificial data with the purpose of helping data scientists and machine learning engineers test different algorithms and perform machine learning experiments without using real collected data. As we will work with neural networks and deep learning frameworks, we will need an acceptably large dataset. The dataset we have in Chapter 1, Getting Started with Machine Learning Using Amazon SageMaker, has only 20 records and will definitely not be a good fit for the recipes in this chapter. In this recipe, we will generate training, validation, and test dummy data using a custom synthetic data generator and store these datasets in Amazon S3.
Important note
Why generate and use synthetic datasets? Working with synthetic datasets will allow us to focus more on the tasks that we are working on as we can simply generate a bare-minimum synthetic dataset to...