Log files and Excel
Let's consider a somewhat realistic use case where you have been provided a number of modified web log files that you want to create some visualizations from.
In Chapter 4, Addressing Big Data Quality, we will discuss data profiling (in regards to data quality), but for now, we'll assume that we know the following about our data files:
The files are of various sizes and somewhat unstructured.
The data in the files contain information logged by Internet users.
The data includes such things as computer IP addresses, a date, timestamp, and a web address/URL. There is more information in the files, but for our exercise here we really just want to create a graphical representation showing the number of times each web address was hit during each month (there are actually software packages that provide web statistics, but we'll suppose you don't have access to any of them).
The following is a sample transaction (record) from one of our files:
221.738.236 - - [15/Oct/2014:6:55:2] GET...