We have seen the evolution of Hadoop from a simple lab experiment tool to one of the most famous projects of Apache Software Foundation in the previous section. When the evolution started, many commercial implementations of Hadoop spawned. Today, we see more than 10 different implementations that exist in the market (Source). There is a debate about whether to go with full open source-based Hadoop or with a commercial Hadoop implementation. Each approach has its pros and cons. Let's look at the open source approach.
Pros of open source-based Hadoop include the following:
- With a complete open source approach, you can take full advantage of community releases.
- It's easier and faster to reach customers due to software being free. It also reduces the initial cost of investment.
- Open source Hadoop supports open standards, making it easy to integrate with any system.
Cons of open source-based Hadoop include the following:
- In the complete open source Hadoop scenario, it takes longer to build implementations compared to commercial software, due to lack of handy tools that speed up implementation
- Supporting customers and fixing issues can become a tedious job due to the chaotic nature of the open source community
- The roadmap of the product cannot be controlled/ginfluenced based on business needs
Given these challenges, many times, companies prefer to go with commercial implementations of Apache Hadoop. We will cover some of the key Hadoop distributions in this section.