Packt+ | Advance your knowledge in tech

You're reading from ModSecurity 2.5 Prevent web application hacking with this easy to use guide

Product type Paperback

Published in Nov 2009

Publisher Packt

ISBN-13 9781847194749

Length 280 pages

Edition 1st Edition

Tools

Modsecurity

Concepts

Application Security

Table of Contents (17) Chapters

ModSecurity 2.5

Credits

About the Author

About the Reviewers

1. Preface

1. Installation and Configuration FREE CHAPTER

2. Writing Rules

3. Performance

4. Audit Logging

5. Virtual Patching

6. Blocking Common Attacks

7. Chroot Jails

8. REMO

9. Protecting a Web Application

Directives and Variables

Regular Expressions

Index

Our email address regex

At the beginning of the chapter I introduced a regular expression for extracting email addresses from web pages. As promised, let's use our newfound knowledge of regexes to see exactly how it works. Here, again, is the regular expression as it was presented in the beginning of the chapter:

\b[-\w.+]+@[\w.]+\.[a-zA-Z]{2,4}\b

We noted that an email address consists of a username, @ character, and domain name. The first part of the regex is \b, which makes sure that the email address starts at a word boundary. Following that, we see that the [-\w.+] character class allows for a word character as well as a dash, dot, or a plus sign. In this case, the dot does not need to be escaped as it is contained within a character class. Also worth noting is that the plus sign inside the character class is also interpreted as a literal plus and not as a repetition quantifier. There is another plus sign immediately following the character class, and this is an actual plus quantifier that is used to match against one or more occurrences of the characters within the character class.

Following this, the @ character is matched literally, as it is a requirement for it to be present in an email address. After this the same character class as before, [\w.]+ is used to allow an arbitrary number of sub-domains (for example, misec.net and support.misec.net are both allowed using this construct).

The second-to-last part of the regular expression is \.[a-zA-Z]{2,4}, and this corresponds to the top-level domain in the email address (such as .com). We see how the dot is required (and is escaped, so that it only matches the dot and not any character). Following this, a letter is required from two up to four times—this allows it to match top-level domains such as de and com and also four-letter domains such as info. Finally, the last part of the regex is another \b word-boundary assertion, to make sure the email address precedes a space or similar word-boundary marker.

The rest of the chapter is locked

You're reading from ModSecurity 2.5 Prevent web application hacking with this easy to use guide

Table of Contents (17) Chapters

O﻿ur email address regex

Unlock this book and the full library FREE for 7 days

Personalised recommendations for you

Our email address regex