- Sklearn Streaming 2

Build a streaming pipeline that applies a sklearn model in the second part of streaming in sklearn.

To implement the streaming model pipeline, we’ll use PySpark with a Python UDF to apply model predictions as new elements arrive.

Example

A Python UDF operates on a single row, while a Pandas UDF operates on a partition of rows. The code for this pipeline is shown in the PySpark snippet below, which first trains a model on the driver node, sets up a data sink for a Kafka stream, defines a UDF for applying an ML model, and then publishes the scores to a new topic as a pipeline output.

...

Access this course and 1400+ top-rated courses and projects.