This chapter introduces Project Tungsten. CPU and main memory performance became the new bottlenecks in big data processing after the massive increase in I/O performance due to the usage of solid state disks (SSDs) and 10 Gbps Ethernet. Therefore, Project Tungsten, the core of the Apache Spark execution engine, aims at improving performance at the CPU and main memory level. This chapter will cover the following topics:
- Memory management beyond the Java Virtual Machine (JVM) Garbage Collector (GC)
- Cache-friendly layout of data in memory
- Code generation
We will have an in-depth look at all these three topics now.