Data modalities
From a modality perspective, all data can be grouped into three categories: structured, semi-structured, and unstructured. The modality is independent of the data source, organization, or storage technologies. In fact, different representations, organizations, and storage technologies perform well with, at the most, one modality. It is very difficult to efficiently support more than one modality.
Structured data is usually stored in databases, Oracle, HBase, Cassandra, and so on. Relational tables are the most commonly used organization and storage mechanism. Usually, structured data formats, data types, and sizes are fixed and well known.
Semi-structured data, as the name implies, has enough structure; however, there is also variability in its size, type, and format. The most common semi-structured formats are
csv
,json
, andparquet
.Unstructured data, of course, is about 85 percent of the data we encounter. Images, audio files, and social media data all are unstructured. A...