Neo4j graph databases: working from Jupyterlab.

Neo4j graph databases: working from Jupyterlab.

Neo4j is a graph database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because they are perpetually stored within the database itself. Relationships can be intuitively visualized, making them useful for heavily inter-connected data.

Graph databases are part of the NoSQL databases created to address the limitations of the existing relational databases. While the graph model explicitly lays out the dependencies between nodes of data, the relational model and other NoSQL database models link the data by implicit connections. Graph databases, by design, allow simple and fast retrieval of complex hierarchical structures that are difficult to model in relational systems.

Many open source packages and graph platforms exist, with support from both Python and R.  We will describe the ones used in our lab to connect Neo4j databases from our JupyterLab server.

Neo4j python driver (the fastest).

We install the neo4jdriver:

pip3 install neo4j

Python code to open a connection to the database and run a simple query:

# import the neo4j driver for python
from neo4j import GraphDatabase

# database credentials
uri = "bolt://localhost:7687"
userName = "xxx"
password = "yyy"

# connect to the neo4j database server
graphDB_Driver = GraphDatabase.driver(uri, auth=(userName, password))

# specify the cypher query
cql = "MATCH (n:Character) RETURN n.name LIMIT 5" 

# run the cypher query
with graphDB_Driver.session() as graphDB_Session:
    nodes = graphDB_Session.run(cql)
    for node in nodes:
        print(node)

Code executed in a Python Jupyter notebook:

py2neo (user-friendly, but too slow).

We install the py2neodriver:

pip3 install py2neo

Pypthon code to open a connection to the database and run a simple query:

# import py2neo from python
from py2neo import Graph

# database credentials
uri = "bolt://localhost:7687"
userName = "xxx"
password = "yyy"

# connect to the neo4j database server
graph = Graph(uri, auth=(userName, password))

# run the cypher query
graph.run("MATCH (a:Character) RETURN a.name AS Name LIMIT 5").to_table()

Code executed in a Python Jupyter notebook:

The new Table class provides methods for multiple styles of output, which allows results to be rendered elegantly in Jupyter.

Neo4jupyter (visualize queries from Neo4j in the Jupyter notebook).

We install the neo4jupyterdriver:

pip3 install neo4jupyter

Pypthon code to open a connection to the database and display some nodes:

# import py2neo from python
from py2neo import Graph

# database credentials
uri = "bolt://localhost:7687"
userName = "xxx"
password = "yyy"

# connect to the neo4j database server
graph = Graph(uri, auth=(userName, password))

# import neo4jupyter from python
import neo4jupyter

# load all the javascript
neo4jupyter.init_notebook_mode()

# define the parameters that you want to be displayed
options = {"Character": "name"}

# display the graph
neo4jupyter.draw(graph, options)

Update: it does not work in JupyterLab yet, but it works in Jupyter.

Neo4j from R (neo4r)

The goal of neo4r is to provide a modern and flexible Neo4j driver for R. It's modern in the sense that the results are returned as tibbles whenever possible, it relies on modern tools, and it is designed to work with pipes.  The driver that can be easily integrated in a data analysis workflow, especially by providing an API working smoothly with other data analysis ( tidyverse) and graph packages (igraph , ggraph, visNetwor…).

Please, note that for now, the connection is only possible through http(s). This means it does not work for Neo4j 4.x versions (bolt connector) yet.

We install the neo4r package:

install.packages("neo4r")

Create a connection object:

library(neo4r)
con <- neo4j_api$new(
    url = "xxx",
    user = "yyy", 
    password = "zzz"
  )

Request your API:

con$ping()
con$get_version()
con$get_constraints()
con$get_labels()
con$get_relationships()
con$get_schema()

Write your cypher query as a character vector, and send it with call_neo4j

library(magrittr)

"MATCH (n:Character) -[:INTERACTS] -> () RETURN n LIMIT 5;" %>%
    call_neo4j(con, type = "graph") %>%
    extract_nodes()

You can extract only the nodes (or only the relationships) with the extract_* functions.

graph results can be turned into igraph objects and plotted:

"MATCH n=()-[r:INTERACTS]->() RETURN n LIMIT 5;" %>%
    call_neo4j(con, type = "graph") %>% 
    convert_to("igraph") %>% 
    plot()

Show Comments