Facebook AI team also made several modifications to the standard negative sampling, which is necessary for large graphs. “We took advantage of the linearity of the functional form to reuse a single batch of N random nodes to produce corrupted negative samples for N training edges..this allows us to train on many negative examples per true edge at a little computational cost”, says the Facebook AI team.
To produce embeddings useful in different downstream tasks, Facebook AI team found an effective approach that involves corrupting edges with a mix of 50 percent nodes sampled uniformly from the nodes, and with 50 percent nodes sampled based on their number of edges.
Apart from that, to analyze PBG’s performance, Facebook AI used the publicly available Freebase knowledge graph comprising more than 120 million nodes and 2.7 billion edges. A smaller subset of the Freebase graph, known as FB15k. was also used. As a result, PBG performed comparably to other state-of-the-art embedding methods for the FB15k data set.
PBG was also used to train embeddings for the full Freebase graph where PBG’s partitioning scheme reduced both memory usage and training time. PBG embeddings were also evaluated for several publicly available social graph data sets and it was found that PBG outperformed all the competing methods.
“We..hope that PBG will be a useful tool for smaller companies and organizations that may have large graph data sets but not the tools to apply this data to their ML applications. We hope that this encourages practitioners to release and experiment with even larger data sets”, states the Facebook AI team.
For more information, check out the official Facebook AI blog.
PyTorch 1.0 is here with JIT, C++ API, and new distributed packages
PyTorch 1.0 preview release is production ready with torch.jit, c10d distributed library, C++ API
PyTorch-based HyperLearn Statsmodels aims to implement a faster and leaner GPU Sklearn