Analyzing web log data using Pig
In this recipe, we will learn how to use Pig scripts to analyze web log data.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.
How to do it...
In the previous chapter, we saw how to analyze web logs using the MapReduce program. In this recipe, we are going to take a look at how to use Pig scripts to analyze web log data. Let's consider two use cases:
Here is a sample of web log data:
106.208.17.105 - - [12/Nov/2015:21:20:32 -0800] "GET /tutorials/mapreduce/advanced-map-reduce-examples-1.html HTTP/1.1" 200 0 "https://www.google.co.in/" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 60.250.32.153 - - [12/Nov/2015:21:42:14 -0800] "GET /tutorials/elasticsearch/install-elasticsearch-kibana-logstash-on-windows.html HTTP/1.1" 304 0 - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490...