Gauging oil prices
Now that we have a substantial amount of data in our data store (we can always add more data using the preceding Spark job) we will proceed to query that data, using the GeoMesa API, to get the rows ready for application to our learning algorithm. We could of course use raw GDELT files, but the following method is a useful tool to have available.
Using the GeoMesa query API
The GeoMesa query API enables us to query for results based upon spatio-temporal attributes, whilst also leveraging the parallelization of the data store, in this case Accumulo with its iterators. We can use the API to build SimpleFeatureCollections
, which we can then parse to realize GeoMesa SimpleFeatures
and ultimately the raw data that matches our query.
At this stage we should build code that is generic, such that we can change it easily should we decide later that we have not used enough data, or perhaps if we need to change the output fields. Initially, we will extract a few fields; SQLDATE
, Actor1Name...