Productizing PySpark
Tools and techniques for productizing PySpark.
We'll cover the following...
Scheduling
Once you’ve tested a batch model pipeline in a notebook environment, there are a few different ways of scheduling the pipeline to run on a regular schedule.
For example, you may want a churn prediction model for a mobile game to run every morning and publish the scores to an application database. Similar to the workflow tools we covered in the previous chapter, a PySpark pipeline should have monitoring in place for any failures that may occur.
Techniques
There are a few different approaches for scheduling PySpark jobs to run: ...