dbt Core, Cloud, BigQuery: Unveiling Data Transformation

Learn about the packages we need to download to work with dbt in this course.

We'll cover the following

dbt Core

dbt Core is a free open-source command line tool. This is the tool that we’re going to work with because it’s free and powerful.

To install dbt Core, we can use the following pip command:

Press + to interact
pip install dbt-core

Note: This course provides environments where dbt Core is already installed.

dbt Cloud

dbt Cloud is a paying tool built on top of dbt Core that offers additional functionalities such as the following:

  • It contains a web user interface with an integrated development environment.

  • It gives users an easy way to handle different environments (development, production, etc.).

  • It has a feature to schedule jobs to run at a certain time

  • It comes with built-in CI/CD workflows to facilitate deployment.

  • It automatically hosts the documentation.

In this course, we’ll work with dbt Core, but this course provides the necessary knowledge to easily switch to dbt Cloud later on.

dbt Core vs. dbt Cloud


Comparison

dbt Core

  • Free
  • Open-source

dbt Cloud

  • Provides a web UI, which makes it easier for users that are not from a technical background
  • Integrated scheduler
  • Hosted cloud environment
  • CI/CD worfklows

dbt-bigquery

dbt works with most cloud data warehouses. As different data warehouses use different SQL syntax, each dbt operation needs to be converted to the correct syntax for a given warehouse.

This conversion is handled by an adapter. Adapters also handle how dbt connects to the data warehouse, and they’re responsible for adapting dbt functions to a particular database. This is extremely powerful since learning to use dbt with one database gives us the ability to use dbt on any other database with very few adjustments.

The adapter uses the supported syntax. Some databases support the SQL IF EXISTS statement; some don’t. Previously, data engineers had to learn a different variant of SQL every time they switched databases, but dbt removed the need for that.

Here’s a list of data warehouses supported by native dbt adapters:

  • AlloyDB

  • Azure Synapse Analytics 

  • BigQuery

  • Databricks

  • Dremio  

  • PostgreSQL

  • Amazon Redshift  

  • Snowflake  

  • Apache Spark  

  • Starburst

  • Trino

The community also developed other adapters in case we want to use a non-supported data warehouse. The updated list can be found in the documentation.

In this course, we’ll use the BigQuery adapter. Most of the concepts and techniques we’ll learn can be applied to other adapters as well, but there might be some configurations that are specific to BigQuery.

The BigQuery adapter uses the gcloud library to handle connection and translates DDL and DML statements into BQL, the specific SQL used for BigQuery.

The dbt documentation provides a comprehensive list of BigQuery configurations, which we can refer to for more details and specific instructions.

To install the BigQuery adapter, we need to run the following command:

Press + to interact
pip install dbt-bigquery