Multi-modal deep learning
The dictionary definition of "modality" states that it is "a particular mode in which something exists or is experienced or expressed." Sensory modalities, like touch, taste, smell, vision, and sound, allow humans to experience the world around them. Suppose you are out at the farm picking strawberries, and your friend tells you to pick ripe and red strawberries. The instruction, ripe and red strawberries, is processed and converted into a visual and haptic criterion. As you see strawberries and feel them, you know instinctively if they match the criteria of ripe and red. This task is an example of multiple modalities working together for a task. As you can imagine, these capabilities are essential for robotics.
As a direct application of the preceding example, consider a harvesting robot that needs to pick ripe and ready fruit. In December 1976, Harry McGurk and John MacDonald published a piece of research titled Hearing lips and...