Search⌘ K

In the Cloud with Snowflake and dbt

Explore how to integrate entity resolution pipelines within cloud platforms like Snowflake using the dbt framework. Understand combining SQL and Python runtimes to preprocess and deduplicate large datasets efficiently, reusing existing data transformation infrastructure for effective cross-referencing.

Snowflake, Databricks, and BigQuery are three competing SQL-first analytics platforms. Many companies bought into one of the three. Also, many teams adapted the dbt framework to author data transformation jobs. This combination of dbt with one of the three SaaS platforms is an excellent fit for integrating entity resolution workflows authored in Python because they all support Python as a secondary language.

The bigger picture

Snowflake, Databricks, BigQuery, and many more examples abstract away their distributed nature. From a user’s perspective, we have all the raw data in one place and choose among SQL and Python to transform the data.

Architecture of an analytics platform using Snowflake and its Python and SQL runtimes
Architecture of an analytics platform using Snowflake and its Python and SQL runtimes
...