Spark 2.0 uses Tungsten Engine, which is built using ideas of modern compilers and MPP databases. It emits optimized bytecode at runtime, which collapses the query into a single function. Hence, there is no need for virtual function calls. It also uses CPU registers to store intermediate data. This technique has been called whole stage code generation.
Reference : https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.htmlSource: https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html
The upcoming table and graph show single function improvements between Spark 1.6 and Spark 2.0:
Chart comparing Performance improvements in Single line functions between Spark 1.6 and Spark 2.0
Table comparing Performance improvements in Single line functions between Spark 1.6 and Spark...