In this section, we will get started with GCP by setting up an account and a project, installing the Software Development Kit (SDK), and using BigQuery to query Wikipedia articles and get you warmed up with big data in Google Cloud.
Setting up a GCP account and project
From a web browser, navigate to https://console.cloud.google.com/ and sign in with your personal Gmail account (if you don't have one, go ahead and create one before continuing). If you're a first-time user, you will be asked to select your country and agree with the Terms of Service, and you will also see a prompt for activating a free trial so that you can get $300 to explore Google Cloud. This may be a banner on the top, or a button on the Home page. Activate your trial, keeping in mind that you will be asked for a valid credit card number. But don't worry – you won't be charged even after the trial ends, unless you manually upgrade to a paid account. There should be a clarifying statement on the trial activation page that confirms this.
Once you've gone through that so that your account has been set up and your trial has been activated, you should be directed to the Google Cloud console. A project named My First Project (or something similar) will be automatically created for you. On the top bar within the console, you can see which project you're currently working under from the top-left corner:
Figure 1.4 – GCP console project view
If you click on the small down arrow next to the name of the project, it will open the project selection menu, as shown in the following screenshot. This is where you can choose which project to work on (for now, there will only be one listed). You can also create a new project from here if you wish to do so:
Figure 1.5 – GCP project selection menu
For every project in GCP, the following must be defined:
- Project Name: Set by you and doesn't need to be globally unique. Can be changed after project creation.
- Project ID: Can be set by you but needs to be globally unique and cannot be changed.
- Project Number: Assigned by GCP. It is globally unique and cannot be changed.
For now, however, we don't need to create a new project (we will do that in Chapter 2, Mastering the Basics of Google Cloud, as well as handling things such as billing and IAM policies. If you're already somewhat familiar with the platform, feel free to skip ahead to the next chapter!).
Installing the Google Cloud SDK and using gcloud
The Google Cloud Software Development Kit (SDK) is a set of tools for interacting with the platform. This includes the gcloud
, gsutil
, and bq
command-line tools, as well as client libraries and local emulators for developing with Google Cloud.
Go to https://cloud.google.com/sdk/install and follow the installation instructions specific to your operating system. Make sure that you complete the last step, which is to run the following on a terminal:
$ gcloud init
If you're a Windows user, after running this command, there will be an option to select at the end of the installation process. This command will initialize gcloud
and set some default configurations. You will be asked to sign in to Google Cloud and choose a project.
If you want to see what the active configuration is, you can run the following command:
$ gcloud config list
This command should list your active configurations, such as the account and project currently being used.
The gcloud
CLI commands are organized into a nested hierarchy of command groups, each one representing a specific service or feature of the platform or their functional subgroups. So, for example, if you want to run commands against virtual machines, you would start with gcloud compute instances
. That's how you "drill down” the hierarchy of command groups: you simply append the respective command group name. This is, of course, a very long list to memorize, but you don't have to. As you type in commands, if you're unsure what options are available for the next level, you can simply add the --help
command suffix flag and you will see the sub-groups and commands you can use from that point on.
For example, if you want to see what you can do with compute instances, you can type in the following:
$ gcloud compute instances --help
In the output, you should see something similar to the following:
Figure 1.6 – gcloud help command output for compute instances
The GROUPS
section lists the command sub-groups available "under” gcloud compute instances
. We can see here that we can drill further down into the VM network interfaces or the VM OS inventory data and run commands against those resources.
The COMMANDS
section lists the commands that can be applied at the level that you are in (in this case, compute instances). In this example, this include things such as creating a VM, attaching a disk, adding tags, and several others not shown in the preceding screenshot.
It is also a good idea to bookmark the cheat sheet from Google so that you can quickly look up commonly used commands:
https://cloud.google.com/sdk/docs/cheatsheet.
Using the bq command-line tool and a primer on BigQuery
The bq
command-line tool is a Python-based CLI for BigQuery.
BigQuery is a petabyte-scale analytics data warehouse. It is actually two services in one:
- SQL Query Engine
- Managed Storage
It therefore provides both a serverless analytics engine and storage space for data, and you don't have to manage any of the underlying infrastructure for that. Google also provides a number of publicly available datasets that are ready to consume, so it's very easy to get started. So, let's run a query right away to count the number of Wikipedia articles whose titles contain the word cloud
or variations of it. Open a terminal and run the following:
$ bq query --use_legacy_sql=false \
'SELECT DISTINCT
title
FROM
`bigquery-public-data`.samples.wikipedia
WHERE
title LIKE "%cloud%”'
The first time you run the bq
command, you may be asked to select a GCP project to be the default one to work with. Simply select the same project you've been working on so far.
This command is simply running a SQL query that selects the title column from the Wikipedia table (under the public dataset called samples
), where the title text includes the substring cloud
. In other words, we're asking to see all the Wikipedia article titles that include the word cloud
in some way.
Your output should look like this:
Figure 1.7 – bq query output
The preceding screenshot only shows a part of the full output. As you can see, there are quite a few cloud-related articles on Wikipedia.
In this section, you signed up for GCP, installed the SDK, got up and running with the platform, and with just a short CLI command, you queried the entire Wikipedia dataset in a matter of just a few seconds, without spinning up a single VM or database – and without even paying for it. This was just a taste of the power of Google Cloud. With that, you have gotten up to speed with how to interact with the platform and what some of its capabilities are.