Data Science in Production: Building Scalable Model Pipelines/

...

- Natality Streaming

Creating pipeline with Dataflow using the Natality dataset.

We'll cover the following...

Example

Defining functions
Building a Dataflow pipeline
Testing the pipeline

Try it out!

Press + to interact

import apache_beam as beam
import argparse
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
import json
class ApplyDoFn(beam.DoFn):
    def __init__(self):
        self._model = None
        from google.cloud import storage
        import pandas as pd
        import pickle as pkl
        import json as js
        self._storage = storage
        self._pkl = pkl
        self._pd = pd
        self._json = js
     
    def process(self, element):
        if self._model is None:
            bucket = self._storage.Client().get_bucket('dsp_model_store')
            blob = bucket.get_blob('natality/sklearn-linear')
            self._model =self._pkl.loads(blob.download_as_string())
        
        element = self._json.loads(element.decode('utf-8'))
        new_x = self._pd.DataFrame.from_dict(element, 
                            orient = "index").transpose().fillna(0)   
        weight = self._model.predict(new_x.iloc[:,1:8])[0]
        return [ { 'guid': element['guid'], 'weight': weight, 
                                   'time': str(element['time']) } ]

Introduction to Building Scalable Model Pipelines

Models as Web Endpoints

Models as Serverless Functions

Create an Echo Function in Lambda

Working with S3 in Lambda

Working with API in Lambda

Containers for Reproducible Models

Working with AWS Container Registry

Workflow Tools for Model Pipelines

PySpark for Batch Pipelines

Cloud Dataflow for Batch Modeling

Streaming Model Workflows

Course Conclusion

- Natality Streaming

Example

Defining functions