Removing repeating characters
In everyday language, people are often not strictly grammatical. They will write things such as I looooooove it
in order to emphasize the word love
. However, computers don't know that "looooooove" is a variation of "love" unless they are told. This recipe presents a method to remove these annoying repeating characters in order to end up with a proper English word.
Getting ready
As in the previous recipe, we will be making use of the re
module, and more specifically, backreferences. A backreference is a way to refer to a previously matched group in a regular expression. This will allow us to match and remove repeating characters.
How to do it...
We will create a class that has the same form as the RegexpReplacer
class from the previous recipe. It will have a replace()
method that takes a single word and returns a more correct version of that word, with the dubious repeating characters removed. This code can be found in replacers.py
in the...