Let's go ahead and see how this translates to a real-world problem such as 3D facial recognition, which is used in phones, security, and so on. In 2D images, this would be largely dependent on the pose and illumination, and we don't have access to depth information. Because of this limitation, we use 3D faces instead so that we don't have to worry about lighting conditions, head orientation, and various facial expressions. For this task, the data we will be using is meshes.
In this case, our meshes make up an undirected, connected graph, G = (V, E, A), where |V| = n is the vertices, E is a set of edges, and contains the d-dimensional pseudo-coordinates, , where . The node feature matrix is denoted as , where each of the nodes contains d-dimensional features. We then define the lth channel of the feature map as fl, of which the ith node...