Writing the Map Reduce program in Java to analyze web log data
In this recipe, we are going to take a look at how to write a map reduce program to analyze web logs. Web logs are data that is generated by web servers for requests they receive. There are various web servers such as Apache, Nginx, Tomcat, and so on. Each web server logs data in a specific format. In this recipe, we are going to use data from the Apache Web Server, which is in combined access logs.
Note
To read more on combined access logs, refer to
Getting ready
To perform this recipe, you should already have a running Hadoop cluster as well as an eclipse similar to an IDE.
How to do it...
We can write map reduce programs to analyze various aspects of web log data. In this recipe, we are going to write a map reduce program that reads a web log file, results pages, views, and their counts. Here is some sample web log data we'll consider as input for our program:
106.208.17...