Chapter 2. Integrity and Inspection
This chapter will cover the following recipes:
- Trimming excess whitespace
- Ignoring punctuation and specific characters
- Coping with unexpected or missing input
- Validating records by matching regular expressions
- Lexing and parsing an e-mail address
- Deduplication of nonconflicting data items
- Deduplication of conflicting data items
- Implementing a frequency table using Data.List
- Implementing a frequency table using Data.MultiSet
- Computing the Manhattan distance
- Computing the Euclidean distance
- Comparing scaled data using the Pearson correlation coefficient
- Comparing sparse data using cosine similarity