Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Graph Machine Learning

You're reading from   Graph Machine Learning Take graph data to the next level by applying machine learning techniques and algorithms

Arrow left icon
Product type Paperback
Published in Jun 2021
Publisher Packt
ISBN-13 9781800204492
Length 338 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Aldo Marzullo Aldo Marzullo
Author Profile Icon Aldo Marzullo
Aldo Marzullo
Claudio Stamile Claudio Stamile
Author Profile Icon Claudio Stamile
Claudio Stamile
Enrico Deusebio Enrico Deusebio
Author Profile Icon Enrico Deusebio
Enrico Deusebio
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Section 1 – Introduction to Graph Machine Learning
2. Chapter 1: Getting Started with Graphs FREE CHAPTER 3. Chapter 2: Graph Machine Learning 4. Section 2 – Machine Learning on Graphs
5. Chapter 3: Unsupervised Graph Learning 6. Chapter 4: Supervised Graph Learning 7. Chapter 5: Problems with Machine Learning on Graphs 8. Section 3 – Advanced Applications of Graph Machine Learning
9. Chapter 6: Social Network Graphs 10. Chapter 7: Text Analytics and Natural Language Processing Using Graphs 11. Chapter 8:Graph Analysis for Credit Card Transactions 12. Chapter 9: Building a Data-Driven Graph-Powered Application 13. Chapter 10: Novel Trends on Graphs 14. Other Books You May Enjoy

Introduction to graphs with networkx

In this section, we will give a general introduction to graph theory. Moreover, in order to merge theoretical concepts with their practical implementation, we will enrich our explanation with code snippets in Python, using networkx.

A simple undirected graph (or simply, a graph) G is defined as a couple G=(V,E) , where V={, .., } is a set of nodes (also called vertices) and E={{, .., {,}} is a set of two-sets (set of two elements) of edges (also called links), representing the connection between two nodes belonging to V.

It is important to underline that since each element of E is a two-set, there is no order between each edge. To provide more detail, {, and {, represent the same edge.

We now provide definitions for some basic properties of graphs and nodes, as follows:

  • The order of a graph is the number of its vertices |V|. The size of a graph is the number of its edges |E|.
  • The degree of a vertex is the number of edges that are adjacent to it. The neighbors of a vertex v in a graph G is a subset of vertex induced by all vertices adjacent to v.
  • The neighborhood graph (also known as an ego graph) of a vertex v in a graph G is a subgraph of G, composed of the vertices adjacent to v and all edges connecting vertices adjacent to v.

An example of what a graph looks like can be seen in the following screenshot:

Figure 1.1 – Example of a graph

Figure 1.1 – Example of a graph

According to this representation, since there is no direction, an edge from Milan to Paris is equal to an edge from Paris to Milan. Thus, it is possible to move in the two directions without any constraint. If we analyze the properties of the graph depicted in Figure 1.1, we can see that it has order and size equal to 4 (there are, in total, four vertices and four edges). The Paris and Dublin vertices have degree 2, Milan has degree 3, and Rome has degree 1. The neighbors for each node are shown in the following list:

  • Paris = {Milan, Dublin}
  • Milan = {Paris, Dublin, Rome}
  • Dublin = {Paris, Milan}
  • Rome = {Milan}

The same graph can be represented in networkx, as follows:

import networkx as nx
G = nx.Graph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin'), ('Milan','Paris'), ('Paris','Dublin'), ('Milan','Rome')]
G.add_nodes_from(V)
G.add_edges_from(E)

Since by default, the nx.Graph() command generates an undirected graph, we do not need to specify both directions of each edge. In networkx, nodes can be any hashable object: strings, classes, or even other networkx graphs. Let's now compute some properties of the graph we previously generated.

All the nodes and edges of the graph can be obtained by running the following code:

print(f"V = {G.nodes}")
print(f"E = {G.edges}")

Here is the output of the previous commands:

V = ['Rome', 'Dublin', 'Milan', 'Paris']
E = [('Rome', 'Milan'), ('Dublin', 'Milan'), ('Dublin', 'Paris'), ('Milan', 'Paris')]

We can also compute the graph order, the graph size, and the degree and neighbors for each of the nodes, using the following commands:

print(f"Graph Order: {G.number_of_nodes()}")
print(f"Graph Size: {G.number_of_edges()}")
print(f"Degree for nodes: { {v: G.degree(v) for v in G.nodes} }")
print(f"Neighbors for nodes: { {v: list(G.neighbors(v)) for v in G.nodes} }") 

The result will be the following:

Graph Order: 4
Graph Size: 4
Degree for nodes: {'Rome': 1, 'Paris': 2, 'Dublin':2, 'Milan': 3}
Neighbors for nodes: {'Rome': ['Milan'], 'Paris': ['Milan', 'Dublin'], 'Dublin': ['Milan', 'Paris'], 'Milan': ['Dublin', 'Paris', 'Rome']}

Finally, we can also compute an ego graph of a specific node for the graph G, as follows:

ego_graph_milan = nx.ego_graph(G, "Milan")
print(f"Nodes: {ego_graph_milan.nodes}")
print(f"Edges: {ego_graph_milan.edges}")

The result will be the following:

Nodes: ['Paris', 'Milan', 'Dublin', 'Rome']
Edges: [('Paris', 'Milan'), ('Paris', 'Dublin'), ('Milan', 'Dublin'), ('Milan', 'Rome')]

The original graph can be also modified by adding new nodes and/or edges, as follows:

#Add new nodes and edges
new_nodes = {'London', 'Madrid'}
new_edges = [('London','Rome'), ('Madrid','Paris')]
G.add_nodes_from(new_nodes)
G.add_edges_from(new_edges)
print(f"V = {G.nodes}")
print(f"E = {G.edges}")

This would output the following lines:

V = ['Rome', 'Dublin', 'Milan', 'Paris', 'London', 'Madrid']
E = [('Rome', 'Milan'), ('Rome', 'London'), ('Dublin', 'Milan'), ('Dublin', 'Paris'), ('Milan', 'Paris'), ('Paris', 'Madrid')]

Removal of nodes can be done by running the following code:

node_remove = {'London', 'Madrid'}
G.remove_nodes_from(node_remove)
print(f"V = {G.nodes}")
print(f"E = {G.edges}")

This is the result of the preceding commands:

V = ['Rome', 'Dublin', 'Milan', 'Paris']
E = [('Rome', 'Milan'), ('Dublin', 'Milan'), ('Dublin', 'Paris'), ('Milan', 'Paris')]

As expected, all the edges that contain the removed nodes are automatically deleted from the edge list.

Also, edges can be removed by running the following code:

node_edges = [('Milan','Dublin'), ('Milan','Paris')]
G.remove_edges_from(node_edges)
print(f"V = {G.nodes}")
print(f"E = {G.edges}")

The final result will be as follows:

V = ['Dublin', 'Paris', 'Milan', 'Rome']
E = [('Dublin', 'Paris'), ('Milan', 'Rome')]

The networkx library also allows us to remove a single node or a single edge from a graph G by using the following commands: G. remove_node('Dublin') and G.remove_edge('Dublin', 'Paris').

Types of graphs

In the previous section, we described how to create and modify simple undirected graphs. Here, we will show how we can extend this basic data structure in order to encapsulate more information, thanks to the introduction of directed graphs (digraphs), weighted graphs, and multigraphs.

Digraphs

A digraph G is defined as a couple G=(V, E), where V={, .., } is a set of nodes and E={(, .., (,)} is a set of ordered couples representing the connection between two nodes belonging to V.

Since each element of E is an ordered couple, it enforces the direction of the connection. The edge , means the node goes into . This is different from , since it means the node goes to .The starting node is called the head, while the ending node is called the tail.

Due to the presence of edge direction, the definition of node degree needs to be extended.

Indegree and outdegree

For a vertex v, the number of head ends adjacent to v is called the indegree (indicated by of v, while the number of tail ends adjacent to v is its outdegree (indicated by ).

An example of what a digraph looks like is available in the following screenshot:

Figure 1.2 – Example of a digraph

Figure 1.2 – Example of a digraph

The direction of the edge is visible from the arrow—for example, Milan -> Dublin means from Milan to Dublin. Dublin has = 2 and = 0, Paris has = 0 and = 2, Milan has = 1 and = 2, and Rome has = 1 and = 0.

The same graph can be represented in networkx, as follows:

G = nx.DiGraph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin'), ('Paris','Milan'), ('Paris','Dublin'), ('Milan','Rome')]
G.add_nodes_from(V)
G.add_edges_from(E)

The definition is the same as that used for simple undirected graphs; the only difference is in the networkx classes that are used to instantiate the object. For digraphs, the nx.DiGraph()class is used.

Indegree and Outdegree can be computed using the following commands:

print(f"Indegree for nodes: { {v: G.in_degree(v) for v in G.nodes} }")
print(f"Outdegree for nodes: { {v: G.out_degree(v) for v in G.nodes} }")

The results will be as follows:

Indegree for nodes: {'Rome': 1, 'Paris': 0, 'Dublin': 2, 'Milan': 1}
Outdegree for nodes: {'Rome': 0, 'Paris': 2, 'Dublin': 0, 'Milan': 2}

As for the undirected graphs, G.add_nodes_from(), G.add_edges_from(), G.remove_nodes_from(), and G.remove_edges_from() functions can be used to modify a given graph G.

Multigraph

We will now introduce the multigraph object, which is a generalization of the graph definition that allows multiple edges to have the same pair of start and end nodes.

A multigraph G is defined as G=(V, E), where V is a set of nodes and E is a multi-set (a set allowing multiple instances for each of its elements) of edges.

A multigraph is called a directed multigraph if E is a multi-set of ordered couples; otherwise, if E is a multi-set of two-sets, then it is called an undirected multigraph.

An example of a directed multigraph is available in the following screenshot:

Figure 1.3 – Example of a multigraph

Figure 1.3 – Example of a multigraph

In the following code snippet, we show how to use networkx in order to create a directed or an undirected multigraph:

directed_multi_graph = nx.MultiDiGraph()
undirected_multi_graph = nx.MultiGraph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin'), ('Milan','Dublin'), ('Paris','Milan'), ('Paris','Dublin'), ('Milan','Rome'), ('Milan','Rome')]
directed_multi_graph.add_nodes_from(V)
undirected_multi_graph.add_nodes_from(V)
directed_multi_graph.add_edges_from(E)
undirected_multi_graph.add_edges_from(E)

The only difference between a directed and an undirected multigraph is in the first two lines, where two different objects are created: nx.MultiDiGraph() is used to create a directed multigraph, while nx.MultiGraph() is used to build an undirected multigraph. The function used to add nodes and edges is the same for both objects.

Weighted graphs

We will now introduce directed, undirected, and multi-weighted graphs.

An edge-weighted graph (or simply, a weighted graph) G is defined as G=(V, E ,w) where V is a set of nodes, E is a set of edges, and is the weighted function that assigns at each edge a weight expressed as a real number.

A node-weighted graph G is defined as G=(V, E ,w) ,where V is a set of nodes, E is a set of edges, and is the weighted function that assigns at each node a weight expressed as a real number.

Please keep the following points in mind:

  • If E is a set of ordered couples, then we call it a directed weighted graph.
  • If E is a set of two-sets, then we call it an undirected weighted graph.
  • If E is a multi-set, we will call it a weighted multigraph (directed weighted multigraph).
  • If E is a multi-set of ordered couples, it is an undirected weighted multigraph.

An example of a directed edge-weighted graph is available in the following screenshot:

Figure 1.4 – Example of a directed edge-weighted graph

Figure 1.4 – Example of a directed edge-weighted graph

From Figure 1.4, it is easy to see how the presence of weights on graphs helps to add useful information to the data structures. Indeed, we can imagine the edge weight as a "cost" to reach a node from another node. For example, reaching Dublin from Milan has a "cost" of 19, while reaching Dublin from Paris has a "cost" of 11.

In networkx, a directed weighted graph can be generated as follows:

G = nx.DiGraph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin', 19), ('Paris','Milan', 8), ('Paris','Dublin', 11), ('Milan','Rome', 5)]
G.add_nodes_from(V)
G.add_weighted_edges_from(E)

Bipartite graphs

We will now introduce another type of graph that will be used in this section: multipartite graphs. Bi- and tripartite graphs—and, more generally, kth-partite graphs—are graphs whose vertices can be partitioned in two, three, or more k-th sets of nodes, respectively. Edges are only allowed across different sets and are not allowed within nodes belonging to the same set. In most cases, nodes belonging to different sets are also characterized by particular node types. In Chapters 7, Text Analytics and Natural Language Processing Using Graphs, and Chapter 8, Graphs Analysis for Credit Cards Transaction, we will deal with some practical examples of graph-based applications and you will see how multipartite graphs can indeed arise in several contexts—for example, in the following scenarios:

  • When processing documents and structuring the information in a bipartite graph of documents and entities that appear in the documents
  • When dealing with transactional data, in order to encode the relations between the buyers and the merchants

A bipartite graph can be easily created in networkx with the following code:

import pandas as pd
import numpy as np
n_nodes = 10
n_edges = 12
bottom_nodes = [ith for ith in range(n_nodes) if ith % 2 ==0]
 top_nodes = [ith for ith in range(n_nodes) if ith % 2 ==1]
iter_edges = zip(
    np.random.choice(bottom_nodes, n_edges),  
    np.random.choice(top_nodes, n_edges))
edges = pd.DataFrame([
    {"source": a, "target": b} for a, b in iter_edges])
B = nx.Graph()
B.add_nodes_from(bottom_nodes, bipartite=0)
 B.add_nodes_from(top_nodes, bipartite=1)
 B.add_edges_from([tuple(x) for x in edges.values])

The network can also be conveniently plotted using the bipartite_layout utility function of networkx, as illustrated in the following code snippet:

from networkx.drawing.layout import bipartite_layout
pos = bipartite_layout(B, bottom_nodes)
 nx.draw_networkx(B, pos=pos)

The bipatite_layout function produces a graph, as shown in the following screenshot:

Figure 1.5 – Example of a bipartite graph

Figure 1.5 – Example of a bipartite graph

Graph representations

As described in the previous sections, with networkx, we can actually define and manipulate a graph by using node and edge objects. In different use cases, such a representation would not be as easy to handle. In this section, we will show two ways to perform a compact representation of a graph data structure—namely, an adjacency matrix and an edge list.

Adjacency matrix

The adjacency matrix M of a graph G=(V,E) is a square matrix (|V| × |V|) matrix such that its element is 1 when there is an edge from node i to node j, and 0 when there is no edge. In the following screenshot, we show a simple example where the adjacency matrix of different types of graphs is displayed:

Figure 1.6 – Adjacency matrix for an undirected graph, a digraph, a multigraph, and a weighted graph

Figure 1.6 – Adjacency matrix for an undirected graph, a digraph, a multigraph, and a weighted graph

It is easy to see that adjacency matrices for undirected graphs are always symmetric, since no direction is defined for the edge. The symmetry instead is not guaranteed for the adjacency matrix of a digraph due to the presence of constraints in the direction of the edges. For a multigraph, we can instead have values greater than 1 since multiple edges can be used to connect the same couple of nodes. For a weighted graph, the value in a specific cell is equal to the weight of the edge connecting the two nodes.

In networkx, the adjacency matrix for a given graph can be computed in two different ways. If G is the networkx of Figure 1.6, we can compute its adjacency matrix as follows:

nx.to_pandas_adjacency(G) #adjacency matrix as pd DataFrame
nt.to_numpy_matrix(G) #adjacency matrix as numpy matrix

For the first and second line, we get the following results respectively:

          Rome  Dublin  Milan  Paris
Rome     0.0     0.0    0.0    0.0
Dublin   0.0     0.0    0.0    0.0
Milan    1.0     1.0    0.0    0.0
Paris    0.0     1.0    1.0    0.0
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [1. 1. 0. 0.]
 [0. 1. 1. 0.]]

Since a numpy matrix cannot represent the name of the nodes, the order of the element in the adjacency matrix is the one defined in the G.nodes list.

Edge list

As well as an adjacency matrix, an edge list is another compact way to represent graphs. The idea behind this format is to represent a graph as a list of edges.

The edge list L of a graph G=(V,E) is a list of size |E| matrix such that its element is a couple representing the tail and the end node of the edge i. An example of the edge list for each type of graph is available in the following screenshot:

Figure 1.7 – Edge list for an undirected graph, a digraph, a multigraph, and a weighted graph

Figure 1.7 – Edge list for an undirected graph, a digraph, a multigraph, and a weighted graph

In the following code snippet, we show how to compute in networkx the edge list of the simple undirected graph G available in Figure 1.7:

print(nx.to_pandas_edgelist(G))

By running the preceding command, we get the following result:

  source  target
0  Milan  Dublin
1  Milan    Rome
2  Paris   Milan
3  Paris  Dublin

Other representation methods, which we will not discuss in detail, are also available in networkx. Some examples are nx.to_dict_of_dicts(G) and nx.to_numpy_array(G), among others.

You have been reading a chapter from
Graph Machine Learning
Published in: Jun 2021
Publisher: Packt
ISBN-13: 9781800204492
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime