- Best Practices
Best practices for Python programmers using PySpark.
We'll cover the following...
While PySpark provides a familiar environment for Python programmers, it’s good to follow a few best practices to make sure you are using Spark efficiently. Here are a set of recommendations I’ve compiled based on my experience porting a few projects from Python to PySpark.
Avoid dictionaries
Using Python data types such as dictionaries means that the code might not be executable in a distributed mode. Instead of using keys to index ...