Accumulators and Broadcast Variables
We'll cover the following...
Sharing data in a cluster
Sharing data in a distributed environment, regardless of the use case, can be confusing.
Understanding the scope (where the variables “live”) and lifecycle (how the values change) of shared variables while executing code in a cluster presents itself as a challenging task.
Within the Spark ecosystem, variables can be passed down to objects that operate in a distributed fashion. Still, these are copies with a different state each while execution takes place.
Furthermore, this is a one-way type of communication, meaning ...