In the Cloud with Snowflake and dbt
Explore how to integrate entity resolution pipelines within cloud platforms like Snowflake using the dbt framework. Understand combining SQL and Python runtimes to preprocess and deduplicate large datasets efficiently, reusing existing data transformation infrastructure for effective cross-referencing.
We'll cover the following...
Snowflake, Databricks, and BigQuery are three competing SQL-first analytics platforms. Many companies bought into one of the three. Also, many teams adapted the dbt framework to author data transformation jobs. This combination of dbt with one of the three SaaS platforms is an excellent fit for integrating entity resolution workflows authored in Python because they all support Python as a secondary language.
The bigger picture
Snowflake, Databricks, BigQuery, and many more examples abstract away their distributed nature. From a user’s perspective, we have all the raw data in one place and choose among SQL and Python to transform the data.