We provided a basic introduction to Hadoop and the following are a few points to remember:
- Doug Cutting, the founder of Hadoop, started the development of Hadoop at Nutch based on a Google research paper on Google File System and MapReduce.
- Apache Lucene is a full-text open-source search library initially written by Doug Cutting in Java.
- Hadoop consists of two important parts, one called the Hadoop Distributed File System and the other called MapReduce.
- YARN is a resource management framework used to schedule and run applications such as MapReduce and Spark.
- Hadoop distributions are a complete package of all open source big data tools integrated together to work with each other in an efficient way.