Delta for unstructured data
The vast majority of data in the world is unstructured – estimated by analysts to be 80 percent of all data they generate or otherwise acquire while doing business. Video, audio, or image files, as well as log files, sensors, or social media posts, all qualify as unstructured data and it is growing at a faster pace than structured data. Object storage technologies have facilitated the storage of all data types in a cheaper, more scalable, and reliable manner, and this has largely been responsible for the increased support of a large variety of use cases. This has led to a spike in deep learning models. Typical use cases include the following:
- Image classification
- Voice recognition
- Anomaly detection
- Recommendation engine
- Sentiment analysis
- Video analysis
Spark supports the image
format as well as the binary
format. The image format has a few limitations around decoding image files during the creation of the DataFrame...