Disambiguation
Text data is often called an unstructured format. There is a lot of information in text, but it is just there; no headings, no required format (save for normal grammatical rules), loose syntax, and other problems prohibit the easy extraction of information from text. The data is also highly connected, with lots of mentions and cross-references—just not in a format that allows us to easily extract it! Even seemingly easy problems, such as determining if a word is a noun, have lots of weird edge cases that make it difficult to do reliably.
We can compare the information stored in a book with that stored in a large database to see the difference. In the book, there are characters, themes, places, and lots of information. However, a book needs to be read and interpreted, with cultural context, to gain this information. In contrast, a database sits on your server with column names and data types. All the information is there and the level of interpretation needed to extract specific...