dbt Core, Cloud, BigQuery: Unveiling Data Transformation
Learn about the packages we need to download to work with dbt in this course.
We'll cover the following
dbt Core
dbt Core is a free open-source command line tool. This is the tool that we’re going to work with because it’s free and powerful.
To install dbt Core, we can use the following pip
command:
pip install dbt-core
Note: This course provides environments where dbt Core is already installed.
dbt Cloud
dbt Cloud is a paying tool built on top of dbt Core that offers additional functionalities such as the following:
It contains a web user interface with an integrated development environment.
It gives users an easy way to handle different environments (development, production, etc.).
It has a feature to schedule jobs to run at a certain time
It comes with built-in CI/CD workflows to facilitate deployment.
It automatically hosts the documentation.
In this course, we’ll work with dbt Core, but this course provides the necessary knowledge to easily switch to dbt Cloud later on.
dbt Core vs. dbt Cloud
Comparison | |
dbt Core |
|
dbt Cloud |
|
dbt-bigquery
dbt works with most cloud data warehouses. As different data warehouses use different SQL syntax, each dbt operation needs to be converted to the correct syntax for a given warehouse.
This conversion is handled by an adapter. Adapters also handle how dbt connects to the data warehouse, and they’re responsible for adapting dbt functions to a particular database. This is extremely powerful since learning to use dbt with one database gives us the ability to use dbt on any other database with very few adjustments.
The adapter uses the supported syntax. Some databases support the SQL IF EXISTS statement; some don’t. Previously, data engineers had to learn a different variant of SQL every time they switched databases, but dbt removed the need for that.
Here’s a list of data warehouses supported by native dbt adapters:
AlloyDB
Azure Synapse Analytics
BigQuery
Databricks
Dremio
PostgreSQL
Amazon Redshift
Snowflake
Apache Spark
Starburst
Trino
The community also developed other adapters in case we want to use a non-supported data warehouse. The updated list can be found in the documentation.
In this course, we’ll use the BigQuery adapter. Most of the concepts and techniques we’ll learn can be applied to other adapters as well, but there might be some configurations that are specific to BigQuery.
The BigQuery adapter uses the gcloud
library to handle connection and translates DDL and DML statements into BQL, the specific SQL used for BigQuery.
The dbt documentation provides a comprehensive list of BigQuery configurations, which we can refer to for more details and specific instructions.
To install the BigQuery adapter, we need to run the following command:
pip install dbt-bigquery