BigQuery is a petabyte-scale, low-cost, and serverless data warehouse in Google Cloud Platform. It is a fully managed service, so we don’t have to worry about underlying computing, network, or storage resources. Therefore, we can use SQL to answer our biggest questions without any maintenance.
BigQuery has three primary built-in features:
Apart from this, BigQuery provides us with the flexibility of storage and computing. We can analyze and store the data in BigQuery, or use it to analyze data from external sources, such as Google Cloud Storage, Google Drive, etc.
BigQuery also provides automatic backups with a seven-day history of changes, and fine-grained governance and security. It ensures data encryption at rest and in transit.
There are three ways to interact with BigQuery.
The web console is a straightforward, web-based solution we can use to interact with our data. We can search, view, or query tables using the editor.
Once the results are available from the query, we can save them in multiple places such as Cloud Storage, Excel, etc. We can also download and save them locally.
The command-line interface comes with the Google Cloud SDK. We specifically use the bq
tool to interact with BigQuery. We can also use it when querying using the service account.
Several client libraries provide us with programmatic access to BigQuery. Currently, there are libraries for the following languages:
All the data assets reside under the project. Data sets are containers for tables and views. There can be multiple projects, data sets, tables, and views.
We can ingest data into BigQuery in two ways.
Real-time data is generated continuously, such as the data from sensors. It can be ingested in BigQuery using PubSub. We can also use Dataflow to receive events from PubSub, perform operations, and ingest them into the BigQuery table.
Batch data is the data at rest, such as CSV files residing in Google Cloud storage. Loading such data is also known as “bulk load,” as we load multiple files in one shot. The straightforward way to bulk load data is to access files residing in cloud storage, process them with Dataflow, and push it to BigQuery.
The data from BigQuery can be used in multiple ways:
We can try BigQuery for free using a sandbox. It also has public datasets, and we can query them right away.