We should now have an understanding of what big data is and how it will impact industries in their decision-making. We have also created and configured our own big data virtual environment so that we can move forward in practical terms and build our own applications. These applications will get more and more mature as we proceed further in this book.
In this chapter, we will look at the Apache Hadoop Project in detail, touching on why it was developed and what the advantages and disadvantages of using Hadoop are. We will also go through the different components and modules of the Hadoop system and will also get our hands dirty with some practical activities. The main topics that we are going to cover in this chapter are as follows:
- Apache Hadoop
- HDFS
- MapReduce Framework
- YARN
We will then take a look at other Apache Projects that are directly or indirectly related...