Lexing and parsing an e-mail address
An elegant way to clean data is by defining a lexer to split up a string into tokens. In this recipe, we will parse an e-mail address using the attoparsec
library. This will naturally allow us to ignore the surrounding whitespace.
Getting ready
Import the attoparsec
parser combinator library:
$ cabal install attoparsec
How to do it…
Create a new file, which we will call Main.hs
, and perform the following steps:
Use the GHC
OverloadedStrings
language extension to more legibly use theText
data type throughout the code. Also, import the other relevant libraries:{-# LANGUAGE OverloadedStrings #-} import Data.Attoparsec.Text import Data.Char (isSpace, isAlphaNum)
Declare a data type for an e-mail address:
data E-mail = E-mail { user :: String , host :: String } deriving Show
Define how to parse an e-mail address. This function can be as simple or as complicated as required:
e-mail :: Parser E-mail e-mail = do skipSpace user <- many' $ satisfy isAlphaNum...