You're reading from Graph Data Science with Neo4j Learn how to use Neo4j 5 with Graph Data Science library 2.0 and its Python driver for your project

Product type Paperback

Published in Jan 2023

Publisher Packt

ISBN-13 9781804612743

Length 288 pages

Edition 1st Edition

Languages

Python

Tools

Neo4j

Concepts

Data Science

Author (1):

Estelle Scifo

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1 – Creating Graph Data in Neo4j

2. Chapter 1: Introducing and Installing Neo4j FREE CHAPTER

3. Chapter 2: Importing Data into Neo4j to Build a Knowledge Graph

4. Part 2 – Exploring and Characterizing Graph Data with Neo4j

5. Chapter 3: Characterizing a Graph Dataset

6. Chapter 4: Using Graph Algorithms to Characterize a Graph Dataset

7. Chapter 5: Visualizing Graph Data

8. Part 3 – Making Predictions on a Graph

9. Chapter 6: Building a Machine Learning Model with Graph Features

10. Chapter 7: Automatically Extracting Features with Graph Embeddings for Machine Learning

11. Chapter 8: Building a GDS Pipeline for Node Classification Model Training

12. Chapter 9: Predicting Future Edges

13. Chapter 10: Writing Your Custom Graph Algorithms with the Pregel API in Java

14. Index

Why subscribe?

15. Other Books You May Enjoy

Neo4j in the graph databases landscape

Even when restricting the scope to graph databases, there are still different ways to envision such data stores:

Resource description framework (RDF): Each record is a triplet of the Subject Predicate Object type. This is a complex vocabulary that expresses a relationship of a certain type (the predicate) between a subject and an object; for instance:
```
Alice(Subject) KNOWS(Predicate) Bob(Object)
```

Very famous knowledge bases such as DBedia and Wikidata use the RDF format. We will talk about this a bit more in the next chapter (Chapter 2, Using Existing Data to Build a Knowledge Graph).

Labeled-property graph (LPG): A labeled-property graph contains nodes and relationships. Both of these entities can be labeled (for instance, Alice and Bob are nodes with the Person label, and the relationship between them has the KNOWS label) and have properties (people have names; an acquaintance relationship can contain the date when both people first met as a property).

Neo4j is a labeled-property graph. And even there, like MySQL, PostgreSQL, and Microsoft SQL Server are all relational databases, you will find different vendors proposing LPG graph databases. They differ in many aspects:

Whether they use a native graph engine or not: As we discussed earlier, it is possible to use a KV store or even a SQL database to store graph data. In this case, we’re talking about non-native storage engines since the storage does not reflect the graphical nature of the data.
The query language: Unlike SQL, the query language to deal with graph data has not yet been standardized, even if there is an ongoing effort being led by the GQL group (see, for instance, https://gql.today/). Neo4j uses Cypher, a declarative query language developed by the company in 2011 and then open-sourced in the openCypher project, allowing other databases to use the same language (see, for instance, RedisGraph or Amazon Neptune). Other vendors have created their own languages (AQL for ArangoDB or CQL for TigerGraph, for instance). To me, this is a key point to take into account since the learning curve can be very different from one language to another. Cypher has the advantage of being very intuitive and a few minutes are enough to write your own queries without much effort.
Their (integrated or not) support for graph analytics and data science.

A note about performances

Almost every vendor claims to be the best one, at least in some aspects. This book won’t create another debate about that. The best option, if performances are crucial for your application, is to test the candidates with a scenario close to your final goal in terms of data volume and the type of queries/analysis.

Neo4j ecosystem

The Neo4j database is already very helpful by itself, but the amount of extensions, libraries, and applications related to it makes it the most complete solution. In addition, it has a very active community of members always keen to help each other, which is one of the reasons to choose it.

The core Neo4j database capabilities can be extended thanks to some plugins. Awesome Procedures on Cypher (APOC), a common Neo4j extension, contains some procedures that can extend the database and Cypher capabilities. We will use it later in this book to load JSON data.

The main plugin we will explore in this book is the Graph Data Science Library. Its predecessor, the Graph Algorithm Library, was first released in 2018 by the Neo4j lab team. It was quickly replaced by the Graph Data Science Library, a fully production-ready plugin, with improved performance. Algorithms are improved and added regularly. Version 2.0, released in 2021, takes graph data science even further, allowing us to train models and build analysis pipelines directly from the library. It also comes with a handy Python client, which is very convenient for including graph algorithms into your usual machine learning processes, whether you use scikit-learn or other machine learning libraries such as TensorFlow or PyTorch.

Besides the plugins, there are also lots of applications out there to help us deal with Neo4j and explore the data it contains. The first application we will use is Neo4j Desktop, which lets us manage several Neo4j databases. Continue reading to learn how to use it. Neo4j Desktop also lets you manage your installed plugins and applications.

Applications installed into Neo4j Desktop are granted access to your active database. While reading this book, you will use the following:

Neo4j Browser: A simple but powerful application that lets you write Cypher queries and visualize the result as a graph, table, or JSON:

Figure 1.4 – Neo4j Browser

Neo4j Bloom: A graph visualization application in which you can customize node styles (size, color, and so on) based on their labels and/or properties:

Figure 1.5 – Neo4j Bloom

Neodash: This is a dashboard application that allows us to draw plots from the data stored in Neo4j, without having to extract this data into a DataFrame first. Plots can be organized into nice dashboards that can be shared with other users:

Figure 1.6 – Neodash

This list of applications is non-exhaustive. You can find out more here: https://install.graphapp.io/.

Good to know

You can create your own graph application to be run within Neo4j Desktop. This is why there are so many diverse applications, some of which are being developed by community members or Neo4j partners.

This section described Neo4j as a database and the various extensions that can be added to it to make it more powerful. Now, it is time to start using it. In the following section, you are going to install Neo4j locally on our computer so that you can run the code examples provided in this book (which you are highly encouraged to do!).

You're reading from Graph Data Science with Neo4j Learn how to use Neo4j 5 with Graph Data Science library 2.0 and its Python driver for your project

Table of Contents (16) Chapters

Neo4j in the graph databases landscape

Neo4j ecosystem

Authors (2)