In the previous chapter, we learned about Spark SQL, which is one of the core components that provides a SQL interface to query large structured and semi-structured datasets. Spark SQL mainly provide APIs for batch analytics for structured data, but there are domains that need to analyze data streaming data in real time or need to execute a machine learning algorithm on a large volume of train data. Spark provides a different set of APIs based on such application requirements.
This chapter will cover the following topics:
- Spark Streaming
- Machine learning
- Graph APIs