Web log analytics
Web logs is data generated by web servers running a website. This use case is applicable to domains where companies have their websites hosted and want to know more about their website performance and customer behavior on the website.
Getting ready
To perform this recipe, you should have an up and running Hadoop cluster. I have uploaded the data of some sample web logs from
https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/mylog.txt.
How to do it...
Before jumping into the solution, let's first try to understand the problem statement:
Problem statement
Many companies run businesses on their websites. Their website performance decides the sales or profitability. Web servers generally log information about the user, browser, IP address, and so on. We can use this information in order to make the website browsing experience smoother for users, which would help increase profitability.
Solution
Here, we assume that a company hosting its website on...