- Best Practices

Best practices for Python programmers using PySpark.

While PySpark provides a familiar environment for Python programmers, it’s good to follow a few best practices to make sure you are using Spark efficiently. Here are a set of recommendations I’ve compiled based on my experience porting a few projects from Python to PySpark.

Avoid dictionaries

Using Python data types such as dictionaries means that the code might not be executable in a distributed mode. Instead of using keys to index ...