Using the Accumulo database
We have seen a method to read our personRdd
object from Elasticsearch and this forms a simple and neat solution for our storage requirements. However, when writing commercial applications, we must always be mindful of security and, at the time of writing, Elasticsearch security is still in development; so it would be useful at this stage to introduce a storage mechanism with native security. This is an important consideration we are using GDELT data that is, of course, open source by definition. In a commercial environment, it is very common for datasets to be confidential or commercially sensitive in some way, and clients will often request details of how their data will be secured long before they discuss the data science aspect itself. It is the authors experience that many a commercial opportunity is lost due to the inability of solution providers to demonstrate a robust and secure data architecture.
Accumulo (http://accumulo.apache.org) is a NoSQL database...