Calculating Columns Using Complex Algorithms: Fuzzy Matching
In the previous chapter, we discussed the importance of distance measures in estimating the dissimilarity between two distinct strings. Continuing our exploration of data analysis techniques, this chapter delves into the world of fuzzy matching, a technique used to determine logical similarities and identity mismatches in duplicates. Unfortunately, finding a dissimilarity metric in string values can be challenging. However, Power BI comes with a complex, reliable, and scalable fuzzy matching algorithm implemented by the Microsoft Research team based on the Jaccard distance. Although this algorithm performs well enough for typical fuzzy matching problems, it’s worth noting that there are other methods available if you require more precision and control. In this document, we will explore the topic of probabilistic data association, which is another powerful tool for your analytics arsenal. We will use a probabilistic...