Getting started

The design philosophy of the Python client is to mimic the GDS Cypher API in Python code. The Python client will translate the Python code written by the user to a corresponding Cypher query which it will then run on the Neo4j server using a Neo4j Python driver connection.

The Python client attempts to be as pythonic as possible to maximize convenience for users accustomed to and experienced with Python environments. As such standard Python and pandas types are used as much as possible. However, to be consistent with the Cypher surface the general return value of calling a method corresponding to a Cypher procedure will be in the form of a table (a pandas DataFrame in Python). Read more about this in Mapping between Cypher and Python.

The root component of the Python client is the GraphDataScience object. Once instantiated it forms the entrypoint to interacting with the GDS library. That includes projecting graphs, running algorithms, and defining and using machine learning pipelines in GDS. As a convention we recommend always calling the instantiated GraphDataScience object gds as using it will then most resemble using the Cypher API directly.

1. Import and setup

The simplest way to instantiate the GraphDataScience object is from a Neo4j server URI and corresponding credentials:

from graphdatascience import GraphDataScience

# Use Neo4j URI and credentials according to your setup
# NEO4J_URI could look similar to "bolt://my-server.neo4j.io:7687"
gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))

# Check the installed GDS version on the server
print(gds.version())
assert gds.version()
Results:
"2.1.9"

Please note that the GraphDataScience object needs to communicate with a Neo4j database upon construction, and uses the default "neo4j" database by default. If there is no such database, you will need to provide a valid database using the database keyword parameter.

1.1. AuraDS

If you are connecting the client to an AuraDS instance, you can get recommended non-default configuration settings of the Python Driver applied automatically. To achieve this, set the constructor argument aura_ds=True:

from graphdatascience import GraphDataScience

# Configures the driver with AuraDS-recommended settings
gds = GraphDataScience(
    "neo4j+s://my-aura-ds.databases.neo4j.io:7687",
    auth=("neo4j", "my-password"),
    aura_ds=True
)

1.2. Instantiating from a Neo4j driver

For some use cases, direct access and control of the Neo4j driver is required. For example if one needs to configure the Neo4j driver that is used in a certain way. In this case, one can use the method GraphDataScience.from_neo4j_driver for instantiating a GraphDataScience object. It takes the same arguments as the regular GraphDataScience constructor, except for the aura_ds keyword parameter which is only relevant when the Neo4j driver under the hood used is instantiated internally.

1.3. Checking license status

To check if the GDS server library we’re running against is has an enterprise license we can make the following call:

using_enterprise = gds.is_licensed()

1.4. Specifying targeted database

If we don’t want to use the default database of our DBMS we can provide the GraphDataScience constructor with the keyword parameter database:

gds = GraphDataScience(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database="my-db")

Or we could change the database we are targeting later:

gds.set_database("my-db")

1.5. Configure Apache Arrow parameters

If Apache Arrow is available on the server, we can provide the GraphDataScience constructor with several keyword parameters to configure the connection:

  • arrow_disable_server_verification: A flag that indicates that, if the flight client is connecting with TLS, that it skips server verification. If this is enabled, all other TLS settings are overridden.

  • arrow_tls_root_certs: PEM-encoded certificates that are used for the connecting to the Apache Arrow Flight server.

gds = GraphDataScience(
  NEO4J_URI,
  auth=(NEO4J_USER, NEO4J_PASSWORD),
  arrow=True,
  arrow_disable_server_verification=False,
  arrow_tls_root_certs=CERT
)

2. Minimal example

In the following example we illustrate the Python client to run a Cypher query, project a graph into GDS, run an algorithm and inspect the result via the client-side graph object. We suppose that we have already created a GraphDataScience object stored in the variable gds.

# Create a minimal example graph
gds.run_cypher(
  """
  CREATE
  (m: City {name: "Malmö"}),
  (l: City {name: "London"}),
  (s: City {name: "San Mateo"}),
  (m)-[:FLY_TO]->(l),
  (l)-[:FLY_TO]->(m),
  (l)-[:FLY_TO]->(s),
  (s)-[:FLY_TO]->(l)
  """
)

# Project the graph into the GDS Graph Catalog
# We call the object representing the projected graph `G_office`
G_office, project_result = gds.graph.project("neo4j-offices", "City", "FLY_TO")

# Run the mutate mode of the PageRank algorithm
mutate_result = gds.pageRank.mutate(G_office, tolerance=0.5, mutateProperty="rank")

# We can inspect the node properties of our projected graph directly
# via the graph object and see that indeed the new property exists
assert G_office.node_properties("City") == ["rank"]
You can also use one of the datasets that comes with the library to get started. See the Datasets chapter for more on this.

The client library is designed so that most methods are inferred under the hood as you type them via a string building scheme and overloading the magic __getattr__ method. Therefore most methods, such as pageRank, will not appear when calling dir(gds). Similarly, IDEs and language servers will not be able to detect these automatically inferred methods, meaning that the auto-completion support they provide will be limited. Rest assured however that despite the lack of this type of discoverability the inferred methods, such as gds.pageRank.stream, will still be called correctly.

3. Running Cypher

As we saw in the example above, the GraphDataScience object has a method run_cypher for conveniently running Cypher queries. This method takes as parameters a query string query: str, an optional Cypher parameters dictionary params: Optional[Dict[str, Any]] as well as an optional string database: Optional[str] to override which database to target. It returns the result of the query in the format of a pandas DataFrame.

4. Close open connections

Similarly to how the Neo4j Python driver supports closing all open connections to the DBMS, you can call close on the GraphDataScience object to the same effect:

# Close any open connections in the underlying Neo4j driver's connection pool
gds.close()

close is also called automatically when the GraphDataScience object is deleted.

5. Mapping between Cypher and Python

There are some general principles for how the Cypher API maps to the Python client API:

  • Method calls corresponding to Cypher procedures (preceded by CALL in the docs) return:

    • A table as a pandas DataFrame, if the procedure returns several rows (eg. stream mode algorithm calls).

    • A row as a pandas Series, if the procedure returns exactly one row (eg. stats mode algorithm calls).

    Some notable exceptions to this are:

    • Procedures instantiating graph objects and model objects have two return values: a graph or model object, and a row of metadata (typically a pandas Series) from the underlying procedure call.

    • Any methods on pipeline, graph or model objects (native to the Python client) mapping to Cypher procedures.

    • gds.version() which returns a string.

  • Method calls corresponding to Cypher functions (preceded by RETURN in the docs) will simply return the value the function returns.

  • The Python client also contains specific functionality for inspecting graphs from the GDS Graph Catalog, using a client-side graph object. Similarly, models from the GDS Model Catalog can be inspected using a client-side model object.

  • Cypher functions and procedures of GDS that take references to graphs and/or models as strings for input typically instead take graph objects and/or model objects as input in the Python client API.

  • To configure and use machine learning pipelines in GDS, specific pipeline objects are used in the Python client.