type

project_id

private_key_id

private_key

client_email

client_id

auth_uri

token_uri

auth_provider_x509_cert_url

client_x509_cert_url

kaggle_username

kaggle_key

Gain insights into building scalable data and model pipelines, explore different cloud environments, delve into streaming workflows, and discover essential tools for creating real-time data products.

dsp-new.tar.gz

PROJECT_ID

aws_access_key_id

aws_secret_access_key

aws_account_id

jupyter-hello

code_widget_advanced

spa_job

spa_job_model_persistence

spa_job_gcloud

spa_job_gcloud_authen

spa_gcloudAuthen_sklearnWorkflows

spa_job_apacheKafka

spa_job_jupyter_gcloudAuth_pubsub

code_widget_basic

spa_job_aws

spa_job_gcloudAuth_GCP_Kubernetes

spa_job_jupyter

spa_job_gcloud_authen_model

spa_job_aws_model

spa_job_aws_echo_service

spa_job_Airflow

spa_job_aws_kaggle_nhl_data

spa_job_kaggle_nhl_data_jupyter

airflow_liveApp

spa_job_heroku

test-athar

spa_job_heroku-copy

core-jupyter

core-jupyter-copy

spa-jupyter

jupyter-hello-spa

osama

sklearn_workflow_job

spa_job_model_endpoint_keras

spa_job_aws-copy

spa_job_aws_model-copy

spa_job_aws_echo_service-copy

spa_job_aws_kaggle_nhl_data-copy

The goal of this course is to provide you with a set of tools that can be used to build predictive model services for product teams.

In this course, you’ll start by covering the different cloud environments and tools for building scalable data and model pipelines. You’ll then learn the different data sets and types of models that will be used heavily in everyday production. Throughout the course, you’ll have plenty of exercises and challenges to get you comfortable working with the diverse toolset.

Lastly, you’ll explore streaming model workflows which is crucial for building real-time data pipelines that move data between different components in a cloud environment. 


After working through this course, you will have gained valuable hands-on experience with many of the tools needed to build data products. You will also have a better understanding of how to build scalable machine learning pipelines in a cloud environment.

Data Science in Production: Building Scalable Model Pipelines

## Spark environment
A Spark environment is a cluster of machines with a single driver node and zero or more worker nodes. The driver machine is the master node in the cluster and is responsible for coordinating the workloads
to perform. 

## Driver and worker nodes
In general, workloads will be distributed across the worker nodes when performing operations on Spark dataframes. However, when working with Python objects, such as lists or dictionaries, objects will be instantiated on the driver node.

Ideally, you want all of your workloads to be operating on worker nodes so that the execution of the steps to perform is distributed across the cluster and not bottlenecked by the driver node. However, there are some types of operations in PySpark where the driver has to perform all of the work. 





# Spark environment
A Spark environment is a cluster of machines with a single driver node and zero or more worker nodes. The driver machine is the master node in the cluster and is responsible for coordinating the workloads
to perform. 

# Driver and worker nodes
In general, workloads will be distributed across the worker nodes when performing operations on Spark dataframes. However, when working with Python objects, such as lists or dictionaries, objects will be instantiated on the driver node.

Ideally, you want all of your workloads to be operating on worker nodes so that the execution of the steps to perform is distributed across the cluster and not bottlenecked by the driver node. However, there are some types of operations in PySpark where the driver has to perform all of the work. 





Distributing workloads in Spark clusters.  

- Spark Clusters

Introduction to Building Scalable Model Pipelines

Models as Web Endpoints

Models as Serverless Functions

Create an Echo Function in Lambda

Working with S3 in Lambda

Working with API in Lambda

Containers for Reproducible Models

Working with AWS Container Registry

Workflow Tools for Model Pipelines

PySpark for Batch Pipelines

Cloud Dataflow for Batch Modeling

Streaming Model Workflows

Course Conclusion

- Spark Clusters

Spark environment

Driver and worker nodes