So far, we have seen that a pure CNN and a pure Euclidean distance approach would not work well for facial recognition. However, we don't have to discard them entirely. Each of them provides something useful for us. Can we combine them together to form something better?
Intuitively, humans recognize faces by comparing their key features. For example, humans use features such as the shape of the eyes, the thickness of the eyebrows, the size of the nose, the overall shape of the face, and so on to recognize a person. This ability comes naturally to us, and we are seldom affected by variations in angles and lighting. Could we somehow teach a neural network to identify these features from images of faces, before using the Euclidean distance to measure the similarity between the identified features? This should sound familiar to you! As we have seen in...