In the previous chapters, we focused on a few basic concepts of machine learning, case studies for machine learning, how to stream data ingestion, and so on. Until now, we have walked you through the different components of data ingestion and data processing, some advanced concepts of the big data ecosystem, and a few best design practices that have to be taken into consideration while designing and implementing Hadoop applications. For any data pipeline that requires infrastructure setup to execute the data pipeline, the infrastructure can either be set up on premises or on the cloud. In this chapter, we will cover the following topics:
- Logical view of Hadoop in the cloud
- How a network setup looks on the cloud
- Resource management made easyÂ
- How to make data pipelines on the cloud
- Cloud high availabilityÂ