You're reading from Graph Data Modeling in Python A practical guide to curating, analyzing, and modeling data with graphs

Product type Paperback

Published in Jun 2023

Publisher Packt

ISBN-13 9781804618035

Length 236 pages

Edition 1st Edition

Languages

Python

Tools

Neo4j

Concepts

Data Analysis

Authors (2):

Gary Hutson

Matt Jackson

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1: Getting Started with Graph Data Modeling

2. Chapter 1: Introducing Graphs in the Real World FREE CHAPTER

3. Chapter 2: Working with Graph Data Models

4. Part 2: Making the Graph Transition

5. Chapter 3: Data Model Transformation – Relational to Graph Databases

6. Chapter 4: Building a Knowledge Graph

7. Part 3: Storing and Productionizing Graphs

8. Chapter 5: Working with Graph Databases

9. Chapter 6: Pipeline Development

10. Chapter 7: Refactoring and Evolving Schemas

11. Part 4: Graphing Like a Pro

12. Chapter 8: Perfect Projections

13. Chapter 9: Common Errors and Debugging

14. Index

Why subscribe?

15. Other Books You May Enjoy

Introduction to NetworkX and igraph

In this chapter, we will introduce two Python packages for creating in-memory graphs: NetworkX and igraph.

NetworkX lets you create graphs, perform graph manipulation, study and visualize their structures, and perform several graph manipulation functions when working with graphs. Their website (https://networkx.org/) contains details of the major changes to the package and the intended usage of the tool.

igraph contains a suite of useful and practical analysis tools, with the aim being to make these efficient and easy to use, in a reproducible way. What is great about igraph is that it is open source and free, plus it supports networks to be built in R, Python, Mathematica, and C/C++. This is our recommended package for creating large networks that can load much more quickly than NetworkX. To read more about igraph, go to https://igraph.org/.

In the following subsections, we will look at the basics of both NetworkX and igraph, with easy-to-follow coding steps. This is the first time you are going to get your hands dirty with graph data modeling.

NetworkX basics

NetworkX is one of the originally available graph libraries for Python and is particularly focused on being user-friendly and Pythonic in its style. It also natively includes methods for calculating some classic network analysis measures:

To import NetworkX into Python, use the following command:
```
import networkx as nx
```
And to create an empty graph, g, use the following command:
```
g = nx.Graph()
```
Now, we need to add nodes to our graph, which can be done using methods of the Graph object belonging to g. There are multiple ways to do this, with the simplest being adding one node at a time:
```
g.add_node(Jeremy)
```
Alternatively, multiple nodes can be added to the graph at once, like so:
```
g.add_nodes_from([Mark, Jeremy])
```
Properties can be added to nodes during creation by passing a node and dictionary tuple to Graph.add_nodes_from:
```
g.add_nodes_from([(Mark, {followers: 2100}), (Jeremy, {followers: 130})])
```
To add an edge to the graph, we can use the Graph.add_edge method, and reference the nodes already present in the graph:
```
g.add_edge(Jeremy, Mark)
```

It is worth noting that, in NetworkX, when adding an edge, any nodes specified as part of that edge not already in the graph will be added implicitly.

To confirm that our graph now contains nodes and edges, we may want to plot it, using matplotlib and networkx.draw(). The with_labels parameter adds the names of the nodes to the plot:
```
import matplotlib.pyplot as plt
nx.draw(g, with_labels=True)
plt.show()
```

This section showed you how you can get up and running with NetworkX in a couple of lines of Python code. In the next section, we will turn our focus to the popular igraph package, which allows us to perform calculations over larger datasets much quicker than using the popular NetworkX.

igraph basics

NetworkX, while user-friendly, suffers from slow speeds when using larger graphs. This is due to its implementation behind the scenes and because it is written in Python, with some C, C++, and FORTRAN.

In contrast, igraph is implemented in pure C, giving the library an advantage when working with large graphs and complex network algorithms. While not as immediately accessible as NetworkX for beginners, igraph is a useful tool to have under your belt when code efficiency is paramount.

Initially, working with igraph is very similar to working with NetworkX. Let’s take a look:

To import igraph into Python, use the following command:
```
import igraph as ig
```
And to create an empty graph, g, use the following command:
```
g = ig.Graph()
```

In contrast to NetworkX, in igraph, all nodes have a prescribed internal integer ID. The first node that’s added has an ID of 0, with all subsequent nodes assigned increasing integer IDs.

Similar to NetworkX, changes can be made to a graph by using the methods of a Graph object. Nodes can be added to the graph with the Graph.add_vertices method (note that a vertex is another way to refer to a node). Two nodes can be added to the graph with the following code:
```
g.add_vertices(2)
```
This will add nodes 0 and 1 to the graph. To name them, we have to assign properties to the nodes. We can do this by accessing the vertices of the Graph object. Similar to how you would access elements of a list, each node’s properties can be accessed by using the following notation. Here, we are setting the name and followers attributes of nodes 0 and 1:
```
g.vs[0][name] = Jeremy
g.vs[1][name] = Mark
g.vs[0][followers] = 130
g.vs[1][followers] = 2100
```
Node properties can also be added listwise, where the first list element corresponds to node ID 0, the second to node ID 1, and so on. The following two lines are equivalent to the four lines shown in step 4:
```
g.vs["name"] = [Jeremy, Mark]
g.vs[followers] = [130, 2100]
```
To add an edge, we can use the Graph.add_edges() method:
```
g.add_edges([(0, 1)])
```

Here, we are only adding one edge, but additional edges can be added to the list parameter required by add_edges. As with NetworkX, if edges are added for nodes that are not currently in the graph, nodes will be created implicitly. However, since igraph requires nodes to have sequential IDs, attempting to add the edge pair (1, 3) to a graph with two vertices will fail.

You're reading from Graph Data Modeling in Python A practical guide to curating, analyzing, and modeling data with graphs

Table of Contents (16) Chapters

Introduction to NetworkX and igraph

NetworkX basics

igraph basics

Authors (2)

Personalised recommendations for you