Transforming homogeneous GNNs to heterogeneous GNNs
To better understand the problem, let’s take a real dataset as an example. The DBLP computer science bibliography offers a dataset, [2-3]
, that contains four types of nodes – papers
(14,328), terms
(7,723), authors
(4,057), and conferences
(20). This dataset’s goal is to correctly classify the authors into four categories – database, data mining, artificial intelligence, and information retrieval. The authors’ node features are a bag-of-words (“0
” or “1
”) of 334 keywords they might have used in their publications. The following figure summarizes the relations between the different node types.
Figure 12.3 – Relationships between node types in the DBLP dataset
These node types do not have the same dimensionalities and semantic relationships. In heterogeneous graphs, relations between nodes are essential, which is why we want to consider...