What is BigQuery in Google Cloud Platform?

BigQuery is a petabyte-scale, low-cost, and serverless data warehouse in Google Cloud Platform. It is a fully managed service, so we don’t have to worry about underlying computing, network, or storage resources. Therefore, we can use SQL to answer our biggest questions without any maintenance.

BigQuery has three primary built-in features:

  1. Machine Learning using BigQuery ML
  2. Business Intelligence driven by the BI Engine
  3. Geospatial Analysis

Apart from this, BigQuery provides us with the flexibility of storage and computing. We can analyze and store the data in BigQuery, or use it to analyze data from external sources, such as Google Cloud Storage, Google Drive, etc.

BigQuery also provides automatic backups with a seven-day history of changes, and fine-grained governance and security. It ensures data encryption at rest and in transit.

Interacting with BigQuery

There are three ways to interact with BigQuery.

Web console

BQ console

The web console is a straightforward, web-based solution we can use to interact with our data. We can search, view, or query tables using the editor.

Once the results are available from the query, we can save them in multiple places such as Cloud Storage, Excel, etc. We can also download and save them locally.

Command Line Interface (CLI)

The command-line interface comes with the Google Cloud SDK. We specifically use the bq tool to interact with BigQuery. We can also use it when querying using the service account.

Client libraries

Several client libraries provide us with programmatic access to BigQuery. Currently, there are libraries for the following languages:

  • C#
  • Go
  • Java
  • Node.js
  • PHP
  • Python
  • Ruby

Data management

BQ sturcture

All the data assets reside under the project. Data sets are containers for tables and views. There can be multiple projects, data sets, tables, and views.

Data Ingestion

We can ingest data into BigQuery in two ways.

Real-time data

Real-time data is generated continuously, such as the data from sensors. It can be ingested in BigQuery using PubSub. We can also use Dataflow to receive events from PubSub, perform operations, and ingest them into the BigQuery table.

Batch data

Batch data is the data at rest, such as CSV files residing in Google Cloud storage. Loading such data is also known as “bulk load,” as we load multiple files in one shot. The straightforward way to bulk load data is to access files residing in cloud storage, process them with Dataflow, and push it to BigQuery.

Using BigQuery data

The data from BigQuery can be used in multiple ways:

  • We can create dashboards using external tools such as Tableau, or internal resources such as DataStudio, etc.
  • Other GCP, tools such as Dataflow, can use the data from BigQuery.
  • We can share the data tables with the other teams or members for querying.
  • We can export the data into a local disk, or Google Sheets, Cloud Storage, etc.
  • We can use it in DataLab, which has an interface similar to the Jupiter notebooks, where we can run Python commands and analyze the data.

We can try BigQuery for free using a sandbox. It also has public datasets, and we can query them right away.