Create an ETL Process with Google BigQuery and Google Data Studio

 

The present gist is a hybrid between a 'go-to' cheat sheet and a tutorial when setting up a new Data Science Project.

Its purpose is to create a basic Google BigQuery data source instance and use Google Data Studio to build a Dashboard.

These tools are part of the Google Cloud Platform suite.


Tables of contents:



System Settings

Settings at the time of writing this gist (20th of March 2021).

Microsoft Windows Operating System

Microsoft Visual Studio Code

Google Cloud Platform

Main Console: https://console.cloud.google.com

Data Studio: https://datastudio.google.com/u/0/navigation/reporting

BigQuery: https://console.cloud.google.com/bigquery?project=imdb-project-307217


Start a New Project

Create a Dataset

Create a Table using BigQuery

Query the dataset

From the Schema display, click the Run a query on the table button to create a query that will look as follows:

NOTE: the final ; to close a query is NOT needed.

In the (new) query tab, add the variables to include in the query, e.g. just use * to select ALL variables, and click Run:

NOTE: click the More button and select Format query to automatically format the query against good practices.

Query Output

After running the query, several options are available:

Back to the "query output table tab" (i.e. imdb_black_white_movies):

Prepare a Dashboard

Following building of the dashboard, go to File to download the dashboard as a .pdf file.

IMDB Dashboard

Also, click the following URL to access the Cloud IMDB Dashboard once permission has been granted.

Connect a data source

DS Variable Setup

A list of (18) Google Connectors appears. After that, Partner Connectors are listed.

DS Data Source

Explore Public Datasets

Alternatively, public datasets can be explored.

Click the Explorer menu, and click the + Add Data button. Then, select the Explore Public Datasets option. Choose the Stack Overflow dataset.

Stack Overflow Dataset

When View Dataset is clicked, the following happens:

IMPORTANT: Do NOT run heavy queries on TBs or PBs of sample data to avoid incurring any unwanted costs. Stick to GBs for learning!