...

/

Introduction to Cloud Dataflow and Batch Modeling

Introduction to Cloud Dataflow and Batch Modeling

Introduction to streaming model workflows.

What is Dataflow?

Dataflow is a tool for building data pipelines that can run locally or scale up to large clusters in a managed environment. While Cloud Dataflow was initially incubated at Google as a GCP specific tool, it now builds upon the open-source Apache Beam library, making it usable in other cloud environments.

The tool provides:

  • input connectors to different data sources, such as BigQuery and files on Cloud Storage
  • operators for transforming and aggregating data
  • output connectors to systems such as Cloud Datastore and BigQuery

In this chapter, we’ll build a pipeline with Dataflow that reads in data from BigQuery, applies a sklearn model to create ...

Access this course and 1400+ top-rated courses and projects.