Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
ModSecurity 2.5

You're reading from   ModSecurity 2.5 Prevent web application hacking with this easy to use guide

Arrow left icon
Product type Paperback
Published in Nov 2009
Publisher Packt
ISBN-13 9781847194749
Length 280 pages
Edition 1st Edition
Arrow right icon
Toc

Table of Contents (17) Chapters Close

ModSecurity 2.5
Credits
About the Author
About the Reviewers
1. Preface
1. Installation and Configuration FREE CHAPTER 2. Writing Rules 3. Performance 4. Audit Logging 5. Virtual Patching 6. Blocking Common Attacks 7. Chroot Jails 8. REMO 9. Protecting a Web Application Directives and Variables Regular Expressions Index

Our email address regex


At the beginning of the chapter I introduced a regular expression for extracting email addresses from web pages. As promised, let's use our newfound knowledge of regexes to see exactly how it works. Here, again, is the regular expression as it was presented in the beginning of the chapter:

\b[-\w.+]+@[\w.]+\.[a-zA-Z]{2,4}\b

We noted that an email address consists of a username, @ character, and domain name. The first part of the regex is \b, which makes sure that the email address starts at a word boundary. Following that, we see that the [-\w.+] character class allows for a word character as well as a dash, dot, or a plus sign. In this case, the dot does not need to be escaped as it is contained within a character class. Also worth noting is that the plus sign inside the character class is also interpreted as a literal plus and not as a repetition quantifier. There is another plus sign immediately following the character class, and this is an actual plus quantifier that is used to match against one or more occurrences of the characters within the character class.

Following this, the @ character is matched literally, as it is a requirement for it to be present in an email address. After this the same character class as before, [\w.]+ is used to allow an arbitrary number of sub-domains (for example, misec.net and support.misec.net are both allowed using this construct).

The second-to-last part of the regular expression is \.[a-zA-Z]{2,4}, and this corresponds to the top-level domain in the email address (such as .com). We see how the dot is required (and is escaped, so that it only matches the dot and not any character). Following this, a letter is required from two up to four times—this allows it to match top-level domains such as de and com and also four-letter domains such as info. Finally, the last part of the regex is another \b word-boundary assertion, to make sure the email address precedes a space or similar word-boundary marker.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image