Summary
In this chapter, we first covered how to identify performance bottlenecks using the EXPLAIN
and ANALYZE
statements. Then, we spoke about the design optimization for performance when using tables, partition, and index. We also covered the data file optimization including file format, compression, and storage. At the end of this chapter, we discussed job and query optimization in Hive. After going through this chapter, we should be able to do performance troubleshooting and tuning in Hive.
In the next chapter, we'll talk about function extensions for Hive.