Vector embeddings for unstructured data modeling
Vector embeddings are lists of floating-point numbers that are used to describe the semantics of unstructured data. The principal feature of vector embeddings is that they have fixed sizes and allow a compact and dense representation of data in fewer bytes, compared to other encoding models. Features can be, in certain cases, engineered manually or using standard methods. An example of embedding can be the description of a color, expressed by the three RGB color components. So, using the RGB representation, we can express any color as an array of numbers:
[34, 93, 232]
While this approach will work perfectly with this and many other data modeling problems, nowadays, generating vector embeddings from unstructured data involves deep learning techniques. These aim to produce models that do the following:
- Take the raw unstructured data as input (a bitmap file or a voice recording).
- Capture the relevant and distinguishing...