Chapter 12. Text Mining
"I think it's much more interesting to live not knowing than to have answers which might be wrong." | ||
--Richard Feynman |
The world is awash in textual data. If you Google, Bing, or Yahoo how much of the data is unstructured, that is, in a textual format, estimates would range from 80 to 90 percent. The real number doesn't matter. What does matter is that a large proportion of the data is in a text format. The implication is that anyone seeking to find insights in the data must develop the capability to process and analyze text.
When I first started out as a market researcher, I used to manually pore through page after page of moderator-led focus groups and interviews with the hope of capturing some qualitative insight—an Aha! moment if you will—and then haggle with fellow team members over whether they had the same insight or not. Then, you would always have that one individual in a project who would swoop in...