...

/

Accumulators and Broadcast Variables

Accumulators and Broadcast Variables

Sharing data in a cluster

Sharing data in a distributed environment, regardless of the use case, can be confusing.

Understanding the scope (where the variables “live”) and lifecycle (how the values change) of shared variables while executing code in a cluster presents itself as a challenging task.

Within the Spark ecosystem, variables can be passed down to objects that operate in a distributed fashion. Still, these are copies with a different state each while execution takes place.

Furthermore, this is a one-way type of communication, meaning ...