GDELT dimensional modeling
As we have chosen to use GDELT for analysis purposes in this book, we will introduce our first example using this dataset. First, let's select some data.
There are two streams of data available: Global Knowledge Graph (GKG) and Events.
For this chapter, we are going to use GKG data to create a time-series dataset queryable from Spark SQL. This will give us a great starting point to create some simple introductory analytics.
In the next chapters, Chapter 4, Exploratory Data Analysis and Chapter 5, Spark for Geographic Analysis, we'll go into more detail but stay with GKG. Then, in Chapter 7, Building Communities, we will explore events by producing our own network graph of persons and using it in some cool analytics.
GDELT model
GDELT has been around for more than 20 years and, during that time, has undergone some significant revisions. For our introductory examples, to keep things simple, let's limit our range of data from 1st April 2013, when GDELT had a major file...