5: Text
Text comes in all sizes and shapes. There can be long sets of comments and short sets of comments. There can be misspellings, slang, foul language, and colloquialisms.
So how is text read and interpreted? What does textual ETL do in order to turn raw text into a database?
In order to see how text is processed, consider some sample text that was randomly chosen:
I am so bummed to say this, but this was the worst experience we have ever had at your restaurant. We came here for my husband’s birthday with high hopes. We’re used to being greeted with the famous cheesy biscuits, but they didn’t come to our table till appetizer service (which took 20 minutes).
The server argued with me about a Long Island having sour mix added (he was saying it didn’t), and then goes on to tell us he’s being trained as a bartender.
We ordered the shrimp artichoke dip, and it showed up to our table boiled over in the sides and burnt on the rim. There...