Batch and Stream Data Processing Using PySpark
When setting up your architecture, you decided whether to support batch or streaming, or both. This chapter will go through the ins and outs of batches and streaming with Apache Spark using Python. Spark can be your go-to tool for moving and processing data at scale. We will also discuss the ins and outs of DataFrames and how to use them in both types of data processing.
In this chapter, we’re going to cover the following main topics:
- Batch processing
- Working with schemas
- User Defined Function
- Stream processing