Writing a regular expression for parsing
The logs look complex. Here's a sample line from a log:
109.128.44.217 - - [31/May/2015:22:55:59 -0400] "GET / HTTP/1.1" 200 14376 "-" "Mozilla/5.0 (iPad; CPU OS 8_1_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B440 Safari/600.1.4"
How can we pick this apart? Python offers us regular expressions as a way to describe (and parse) this string of characters.
We write a regular expression as a way of defining a set of strings. The set can be very small and have only a single string in it, or the set can be large and describe an infinite number of related strings. We have two issues that we have to overcome: how do we specify infinite sets? How can we separate those characters that help specify a rule from characters that just mean themselves?
For example, we might write a regular expression like aabr
. This specifies a set that contains a single string. This regular expression looks like the mathematical expression a×a×b×r that...