- Sklearn Streaming 2
Build a streaming pipeline that applies a sklearn model in the second part of streaming in sklearn.
To implement the streaming model pipeline, we’ll use PySpark with a Python UDF to apply model predictions as new elements arrive.
Example
A Python UDF operates on a single row, while a Pandas UDF operates on a partition of rows. The code for this pipeline is shown in the PySpark snippet below, which first trains a model on the driver node, sets up a data sink for a Kafka stream, defines a UDF for applying an ML model, and then publishes the scores to a new topic as a pipeline output.
Replace
{external IP}
with the public IP of your machine or EC2 instance.
Get hands-on with 1200+ tech skills courses.