The following reading material is intended to enhance your understanding of what was covered in this chapter:
- Pipeline models in Spark: https://blog.insightdatascience.com/spark-pipelines-elegant-yet-powerful-7be93afcdd42
- Batch Transform: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html
- Topic modeling: https://medium.com/ml2vec/topic-modeling-is-an-unsupervised-learning-approach-to-clustering-documents-to-discover-topics-fdfbf30e27df
- Transformers in Spark: https://spark.apache.org/docs/1.6.0/ml-guide.html#transformers