Chapter 6: String Clustering
Frequently when wrangling data, you will find columns that look as though they have similar values, but they do not. To handle this task, Optimus gives you some handy techniques through which you can easily detect which strings are similar and group them, giving you some options that could point to the best value in the group. We will explore all these techniques in this chapter.
In this chapter, we will learn about the following topics:
- Exploring string clustering
- Key collision methods
- Phonetic encoding
- Nearest-neighbor methods
- Applying suggestions