When given a new dataset, it is first important to recognize whether or not your data is structured or unstructured:
-
Structured (organized) data: Data that can be broken down into observations and characteristics. They are generally organized using a tabular method (where rows are observations and columns are characteristics).
-
Unstructured (unorganized) data: Data that exists as a free-flowing entity and does not follow standard organizational hierarchy such as tabularity. Often, unstructured data appears to us as a blob of data, or as a single characteristic (column).
A few examples that highlight the difference between structured and unstructured data are as follows:
-
Data that exists in a raw free-text form, including server logs and tweets, are unstructured
-
Meteorological data, as reported by scientific instruments...