Getting started

The design philosophy of the Python client is to mimic the GDS Cypher API in Python code. The Python client will translate the Python code written by the user to a corresponding Cypher query which it will then run on the Neo4j server using a Neo4j Python driver connection.

The Python client attempts to be as pythonic as possible to maximize convenience for users accustomed to and experienced with Python environments. As such standard Python and pandas types are used as much as possible. However, to be consistent with the Cypher surface the general return value of calling a method corresponding to a Cypher procedure will be in the form of a table (a pandas DataFrame in Python). Read more about this in Mapping between Cypher and Python.

The root component of the Python client is the GraphDataScience object. Once instantiated it forms the entrypoint to interacting with the GDS library. That includes projecting graphs, running algorithms, and defining and using machine learning pipelines in GDS. As a convention we recommend always calling the instantiated GraphDataScience object gds as using it will then most resemble using the Cypher API directly.

1. Import and setup

The simplest way to instantiate the GraphDataScience object is from a Neo4j server URI and corresponding credentials:

from graphdatascience import GraphDataScience

# Use Neo4j URI and credentials according to your setup
gds = GraphDataScience("bolt://localhost:7687", auth=None)

print(gds.version())
Results:
"2.1.9"

Alternatively, for use cases where direct access and control of the Neo4j driver is required, one can use the method GraphDataScience.from_neo4j_driver for instantiating the gds object.

If we don’t want to use the default database of our DBMS, we can specify which one to use:

gds.set_database("my-db")

To check if the GDS server library we’re running against is licensed we can make the following call:

using_enterprise = gds.is_licensed()

The client library is designed so that most methods are inferred under the hood as you type them via a string building scheme and overloading the magic __getattr__ method. Therefore most methods, such as pageRank, will not appear when calling dir(gds). Similarly, IDEs and language servers will not be able to detect these automatically inferred methods, meaning that the auto-completion support they provide will be limited. Rest assured however that despite the lack of this type of discoverability the inferred methods, such as gds.pageRank.stream, will still be called correctly.

1.1. AuraDS

If you are connecting the client to an AuraDS instance, you can get recommended non-default configuration settings of the Python Driver applied automatically. To achieve this, set the constructor argument aura_ds=True:

from graphdatascience import GraphDataScience

# Configures the driver with AuraDS-recommended settings
gds = GraphDataScience(
    "neo4j+s://my-aura-ds.databases.neo4j.io:7687",
    auth=("neo4j", "my-password"),
    aura_ds=True
)

2. Minimal example

In the following example we illustrate the Python client to run a Cypher query, project a graph into GDS, run an algorithm and inspect the result via the client-side graph object.

from graphdatascience import GraphDataScience

# We follow the convention to name our `GraphDataScience` object `gds`
gds = GraphDataScience("bolt://my-server.neo4j.io:7687", auth=("neo4j", "my-password"))

# Create a minimal example graph
gds.run_cypher(
  """
  CREATE
  (m: City {name: "Malmö"}),
  (l: City {name: "London"}),
  (s: City {name: "San Mateo"}),
  (m)-[:FLY_TO]->(l),
  (l)-[:FLY_TO]->(m),
  (l)-[:FLY_TO]->(s),
  (s)-[:FLY_TO]->(l)
  """
)

# Project the graph into the GDS Graph Catalog
# We call the object representing the projected graph `G_office`
G_office, project_result = gds.graph.project("neo4j-offices", "City", "FLY_TO")

# Run the mutate mode of the PageRank algorithm
mutate_result = gds.pageRank.mutate(G_office, tolerance=0.5, mutateProperty="rank")

# We can inspect the node properties of our projected graph directly
# via the graph object and see that indeed the new property exists
print(G_office.node_properties("City"))
Results:
["rank"]

3. Close open connections

Similarly to how the Neo4j Python driver supports closing all open connections to the DBMS, you can call close on the GraphDataScience object to the same effect:

# Close any open connections in the underlying Neo4j driver's connection pool
gds.close()

close is also called automatically when the GraphDataScience object is deleted.

4. Mapping between Cypher and Python

There are some general principles for how the Cypher API maps to the Python client API:

  • Method calls corresponding to Cypher procedures (preceded by CALL in the docs) return:

    • A table as a pandas DataFrame, if the procedure returns several rows (eg. stream mode algorithm calls).

    • A row as a pandas Series, if the procedure returns exactly one row (eg. stats mode algorithm calls).

    Some notable exceptions to this are:

    • Procedures instantiating graph objects and model objects have two return values: a graph or model object, and a row of metadata (typically a pandas Series) from the underlying procedure call.

    • Any methods on pipeline, graph or model objects (native to the Python client) mapping to Cypher procedures.

    • gds.version() which returns a string.

  • Method calls corresponding to Cypher functions (preceded by RETURN in the docs) will simply return the value the function returns.

  • The Python client also contains specific functionality for inspecting graphs from the GDS Graph Catalog, using a client-side graph object. Similarly, models from the GDS Model Catalog can be inspected using a client-side model object.

  • Cypher functions and procedures of GDS that take references to graphs and/or models as strings for input typically instead take graph objects and/or model objects as input in the Python client API.

  • To configure and use machine learning pipelines in GDS, specific pipeline objects are used in the Python client.