Spark 2.0 uses Tungsten Engine, which is built using ideas of modern compilers and MPP databases. It emits optimized bytecode at runtime, which collapses the query into a single function. Hence, there is no need for virtual function calls. It also uses CPU registers to store intermediate data. This technique has been called whole stage code generation.
![](https://static.packt-cdn.com/products/9781785889936/graphics/assets/image_03_004.png)
Reference : https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.htmlSource: https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html
The upcoming table and graph show single function improvements between Spark 1.6 and Spark 2.0:
![](https://static.packt-cdn.com/products/9781785889936/graphics/assets/image_03_005.png)
Chart comparing Performance improvements in Single line functions between Spark 1.6 and Spark 2.0
![](https://static.packt-cdn.com/products/9781785889936/graphics/assets/image_03_006.png)
Table comparing Performance improvements in Single line functions between Spark 1.6 and Spark...