The Vision API lets us build quite a few applications related to vision:
- Detecting labels in an image
- Detecting the text in an image
- Face detection
- Emotion detection
- Logo detection
- Landmark detection
Before we dive into building applications using the preceding, let's get a quick understanding of how they might be built, using face emotion detection as an example.
The process of detecting emotions involves:
- Collecting a huge set of images
- Hand-labeling images with the emotion that is likely represented in the image
- Training a convolutional neural network (CNN) (to be discussed in future chapters) to classify the emotion, based on an image as input
While the preceding steps are heavily resource intensive (as we would need a lot of humans to collect and hand-label images), there are multiple other ways to obtain face emotion detection. We are not sure how Google...