Exploring the Stack Overflow dataset with Elastic Graph
Stack Overflow is a website widely used to ask and answer questions about a very large set of topics in the computer science industry. It's a perfect resource to try out Elastic Graph, as the data it holds will contain users who are connected to questions, answers, tags, comments, and so on. In this section, we'll index the Stack Overflow dataset in Elasticsearch, look at the structure of the data, and build relations using Elastic Graph.
Prepare to graph!
The dataset we will use is located in the source attached to this book in the Chapter 6 folder. You will find a ZIP file called StackOverflow4Graph.zip
that contains the following files:
IndexPosts.py
: Python script that indexes the data in your Elasticsearch clusterPosts.csv
: The dataset itselfreadme.txt
: The readme file, which, by the way, contains a link to a tweet that illustrates what we are going to do in this part
The following example gives an idea of Stack Overflow Graph...