Analyzing the Twitter stream
In the following examples, we will use the implementation of JsonLoader provided by Elephant Bird to load and manipulate JSON data. We will use Pig to explore tweet metadata and analyze trends in the dataset. Finally, we will model the interaction between users as a graph and use Apache DataFu to analyze this social network.
Prerequisites
Download the elephant-bird-pig
(http://central.maven.org/maven2/com/twitter/elephantbird/elephant-bird-pig/4.5/elephant-bird-pig-4.5.jar), elephant-bird-hadoop-compat
(http://central.maven.org/maven2/com/twitter/elephantbird/elephant-bird-hadoop-compat/4.5/elephant-bird-hadoop-compat-4.5.jar),
and elephant-bird-core
(http://central.maven.org/maven2/com/twitter/elephantbird/elephant-bird-core/4.5/elephant-bird-core-4.5.jar) JAR files from the Maven central repository and copy them onto HDFS using the following command:
$ hdfs dfs -put target/elephant-bird-pig-4.5.jar hdfs:///jar/ $ hdfs dfs –put target/elephant-bird-hadoop-compat...