Deduplication of conflicting data items
Unfortunately, information about an item may be inconsistent throughout the corpus. Collision strategies are often domain-dependent, but one common way to manage this conflict is by simply storing all variations of the data. In this recipe, we will read a CSV file that contains information about musical artists and store all of the information about their songs and genres in a set.
Getting ready
Create a CSV input file with the following musical artists. The first column is for the name of the artist or band. The second column is the song name, and the third is the genre. Notice how some musicians have multiple songs or genres.
How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
We will be using the
CSV
,Map
, andSet
packages:import Text.CSV (parseCSV, Record) import Data.Map (fromListWith) import qualified Data.Set as S
Define the
Artist
data type corresponding to the CSV input. For fields that may contain conflicting...