Unstructured versus structured data
As developers, we use different input and output methods and formats in our applications. One of the most common ones is files. We create files, copy, or move them to different locations or process them by reading and making changes to their contents.
A file contains, by default, what is known as unstructured data. This means that an audio, video, or text file is just data in a specific format. We can know the size of the file, what format was used to store the data, and maybe have access to some additional structured metadata, such as the creation date or the owner of the file, but we don’t know anything about the contents. What’s this video about? Is that audio file a song or a voice recording? Which language does this audio use? Is that text file a poem, or does it contain the transcription of a movie? Being able to answer these questions can enable more useful and impactful features in our applications.
The simplest example...