Chapter 7. Performance Considerations
Although Hive is built to deal with big data, we still cannot ignore the importance of performance. Most of the time, a better Hive query can rely on the smart query optimizer to find the best execution strategy as well as the default setting best practice from vendor packages. However, as experienced users, we should learn more about the theory and practice of performance tuning in Hive, especially when working in a performance-based project or environment. In this chapter, we will start from utilities available in Hive to find potential issues causing poor performance. Then, we introduce the best practices of performance considerations in the areas of design, file format, compression, storage, query, and job.
In this chapter, we will cover the following topics:
- Performance utilities
- Design optimization
- Data file optimization
- Job and query optimization