Deduplication of nonconflicting data items
Duplication is a common problem when collecting large amounts of data. In this recipe, we will combine similar records in a way that ensures no information is lost.
Getting ready
Create an input.csv
file with repeated data:
How to do it...
Create a new file, which we will call Main.hs
, and perform the following steps:
We will be using the
CSV
,Map
, andMaybe
packages:import Text.CSV (parseCSV, Record) import Data.Map (fromListWith) import Control.Applicative ((<|>))
Define the
Item
data type corresponding to the CSV input:data Item = Item { name :: String , color :: Maybe String , cost :: Maybe Float } deriving Show
Get each record from CSV and put them in a map by calling our
doWork
function:main :: IO () main = do let fileName = "input.csv" input <- readFile fileName let csv = parseCSV fileName input either handleError doWork csv
If we're unable to parse CSV, print an error message...