The nitty gritty of cleaning text
Strings are used to support text processing so using a good string library is important. Unfortunately, the java.lang.String
class has some limitations. To address these limitations, you can either implement your own special string functions as needed or you can use a third-party library.
Creating your own library can be useful, but you will basically be reinventing the wheel. It may be faster to write a simple code sequence to implement some functionality, but to do things right, you will need to test them. Third-party libraries have already been tested and have been used on hundreds of projects. They provide a more efficient way of processing text.
There are several text processing APIs in addition to those found in Java. We will demonstrate two of these:
Apache Commons:Â https://commons.apache.org/
Guava:Â https://github.com/google/guava
Java provides many supports for cleaning text data, including methods in the String
class. These methods are ideal for simple...