Scrubbing and cleaning the data
Here comes the cleaning part!
Print some of the groceries contained within the description field of OnlineRetail
:
kable(OnlineRetail$Description[1:5],col.names=c("Grocery Item Descriptions")) |Grocery Item Descriptions | |:-----------------------------------------| |WHITE HANGING HEART T-LIGHT HOLDER | |METAL METAL LANTERN | |CREAM CUPID HEARTS COAT HANGER | |KNITTED UNION FLAG HOT WATER BOTTLE | |RED WOOLLY HOTTIE WHITE HEART. |
Although each line contains a separate grocery item, the items are in a uniform format, that is, the number of words describing each item can vary, and some words are adjectives and some are nouns. Additionally, the retailer may deem certain words to be irrelevant to a particular marketing campaign (such as colors, or sizes, which may be standard across all products). This type of data can be referred to as semi-structured data, since it incorporates certain...