Broadcast variable
Broadcast variable is a read-only variable shared among each executor node. A variable, once broadcasted, gets copied to each executor's memory and can be referred to whenever needed in the execution of the program.
The broadcast variable is a very useful feature if some data needs to be referred to during the execution of tasks at various stages of the program. Like the distributed cache concept in Hadoop, where lookup data in a table can be placed for a map side join, the broadcast variable can be used in Spark to keep the look up data available in each executor's memory.
Properties of the broadcast variable
The following are the properties for the broadcast variable:
- Broadcast variables are read-only: Broadcast variables are immutable, that is, once initialized their value cannot be changed.
- Broadcast variables get copied to executor memory at the time of creation: A broadcast variable gets cached to the executor's memory only once, at the time of creation. Therefore, it...