The graph object

In order to utilize most the functionality in GDS, you must first project a graph into the GDS Graph Catalog. When projecting a graph with the Python client, a client-side reference to the projected graph is returned. We call these references Graph objects.

Once created, the Graph objects can be passed as arguments to other methods in the Python client, for example for running algorithms or training machine learning models. Additionally, the Graph objects have convenience methods allowing for inspection of the projected graph represented without explicitly involving the graph catalog.

In the examples below we assume that we have an instantiated GraphDataScience object called gds. Read more about this in Getting started.

1. Projecting a graph object

There are several ways of projecting a graph object. The simplest way is to do a native projection:

# We put this simple graph in our database
gds.run_cypher(
  """
  CREATE
    (m: City {name: "Malmö"}),
    (l: City {name: "London"}),
    (s: City {name: "San Mateo"}),

    (m)-[:FLY_TO]->(l),
    (l)-[:FLY_TO]->(m),
    (l)-[:FLY_TO]->(s),
    (s)-[:FLY_TO]->(l)
  """
)

# We estimate required memory of the operation
res = gds.graph.project.estimate(
    ["City"],                   #  Node projection
    "FLY_TO",                   #  Relationship projection
    readConcurrency=4           #  Configuration parameters
)
assert res["bytesMax"] < 1e12

G, result = gds.graph.project(
    "offices",                  #  Graph name
    ["City"],                   #  Node projection
    "FLY_TO",                   #  Relationship projection
    readConcurrency=4           #  Configuration parameters
)

assert G.node_count() == result["nodeCount"]

where G is a Graph object, and result is a pandas Series containing metadata from the underlying procedure call.

Note that all projection syntax variants are supported by way of specifying a Python dict or list for the node and relationship projection arguments. To specify configuration parameters corresponding to the keys of the procedure’s configuration map, we give named keyword arguments, like for readConcurrency=4 above. Read more about the syntax in the GDS manual.

Similarly to Cypher there’s also a corresponding gds.graph.project.estimate method that can be called in an analogous way.

To get a graph object that represents a graph that has already been projected into the graph catalog, one can call the client-side only get method and passing it a name:

G = gds.graph.get("offices")

For users who are GDS admins, gds.graph.get will resolve graph names into Graph objects also when the provided name refers to another user’s graph projection.

In addition to those aforementioned there are five more methods that create graph objects:

  • gds.graph.project.cypher (This is the Legacy Cypher projection, see Projecting a graph using Cypher Projection for the new Cypher projection)

  • gds.graph.filter

  • gds.graph.generate

  • gds.graph.sample.rwr

  • gds.graph.sample.cnarw

Their Cypher signatures map to Python in much the same way as gds.graph.project above.

2. Projecting a graph using Cypher Projection

The method gds.graph.cypher.project allows for projecting a graph using Cypher projection. Cypher projection is not a dedicated procedure; rather it requires writing a Cypher query that calls the gds.graph.project aggregation function.

Read more about Cypher projection in the GDS manual.

The method gds.graph.cypher.project bridges the gap between gds.run_cypher and having to follow with gds.graph.get.

2.1. Syntax

Table 1. Cypher projection signature
Name Type Default Description

query

str

-

The Cypher query to be executed. Must end with RETURN gds.graph.project(…​).

database

Optional[str]

None

Overrides the target database. The default uses the database from the connection.

**params

Any

\{\}

The query parameters as keyword args.

Unlike gds.run_cypher but very much like gds.graph.project, it returns a tuple of a Graph object and a pandas Series containing metadata from Cypher execution.

The method does not modify the Cypher query in any way, all projection configuration must be done in the query itself. The method however verifies that the query only contains a single RETURN gds.graph.project(…​) clause and that this clause appears at the end of the query. This means that this method cannot be used to project a graph together with other aggregation in Cypher, nor can the result row be renamed. If the provided query fails the validation, the fallback is to use gds.run_cypher together with gds.graph.get to achieve the same result.

# We put this simple graph in our database
gds.run_cypher(
  """
  CREATE
    (m: City {name: "Malmö"}),
    (l: City {name: "London"}),
    (s: City {name: "San Mateo"}),

    (m)-[:FLY_TO]->(l),
    (l)-[:FLY_TO]->(m),
    (l)-[:FLY_TO]->(s),
    (s)-[:FLY_TO]->(l)
  """
)

G, result = gds.graph.cypher.project(
    """
    MATCH (n)-->(m)
    RETURN gds.graph.project($graph_name, n, m, {
        sourceNodeLabels: $label,
        targetNodeLabels: $label,
        relationshipType: $rel_type
    })
    """,                   #  Cypher query
    database="neo4j",      #  Target database
    graph_name="offices",  #  Query parameter
    label="City",          #  Query parameter
    rel_type="FLY_TO"      #  Query parameter
)

assert G.node_count() == result["nodeCount"]

3. Constructing a graph from DataFrames

In addition to projecting a graph from the Neo4j database, it is also possible to create graphs directly from pandas DataFrame objects.

3.1. Syntax

Table 2. Graph construct signature
Name Type Default Description

graph_name

str

-

Name of the graph to be constructed.

nodes

Union[DataFrame, List[DataFrame]]

-

One or more dataframes containing node data.

relationships

Union[DataFrame, List[DataFrame]]

-

One or more dataframes containing relationship data.

concurrency

int

4

Number of threads used to construct the graph.

undirected_relationship_types

Optional[List[str]]

None

List of relationship types to be projected as undirected.

3.2. Example

nodes = pandas.DataFrame(
    {
        "nodeId": [0, 1, 2, 3],
        "labels":  ["A", "B", "C", "A"],
        "prop1": [42, 1337, 8, 0],
        "otherProperty": [0.1, 0.2, 0.3, 0.4]
    }
)

relationships = pandas.DataFrame(
    {
        "sourceNodeId": [0, 1, 2, 3],
        "targetNodeId": [1, 2, 3, 0],
        "relationshipType": ["REL", "REL", "REL", "REL"],
        "weight": [0.0, 0.0, 0.1, 42.0]
    }
)

G = gds.graph.construct(
    "my-graph",      # Graph name
    nodes,           # One or more dataframes containing node data
    relationships    # One or more dataframes containing relationship data
)

assert "REL" in G.relationship_types()

The above example creates a graph from two DataFrame objects, one for nodes and one for relationships. The projected graph is equivalent to a graph that the following Cypher query would create in a Neo4j database:

CREATE
    (a:A {prop1: 42,    otherProperty: 0.1),
    (b:B {prop1: 1337,  otherProperty: 0.2),
    (c:C {prop1: 8,     otherProperty: 0.3),
    (d:A {prop1: 0,     otherProperty: 0.4),

    (a)-[:REL {weight: 0.0}]->(b),
    (b)-[:REL {weight: 0.0}]->(c),
    (c)-[:REL {weight: 0.1}]->(d),
    (d)-[:REL {weight: 42.0}]->(a),

The supported format for the node data frames is described in Arrow node schema and the format for the relationship data frames is described in Arrow relationship schema.

3.3. Apache Arrow flight server support

The construct method can utilize the Apache Arrow Flight Server of GDS if it’s enabled. This in particular means that:

  • The construction of the graph is greatly sped up,

  • It is possible to supply more than one data frame, both for nodes and relationships. If multiple node dataframes are used, they need to contain distinct node ids across all node data frames.

  • Prior to the construct call, a call to GraphDataScience.set_database must have been made to explicitly specify which Neo4j database should be targeted.

3.4. Limitations on community edition

For users of GDS community edition, performance can be impacted for large graphs. It is possible that socket connection with the database times out. If this happens, a possible workaround is to modify the server configuration server.bolt.connection_keep_alive or server.bolt.connection_keep_alive_probes. However, be aware of the side effects such as a genuine connection issue now taking longer to be detected.

4. Loading a NetworkX graph

Another way to construct a graph from client-side data is by using the library’s convenience NetworkX loading method. In order to use this method, one has to install NetworkX support for the graphdatascience library:

pip install graphdatascience[networkx]

The method that exposes the NetworkX dataset loading functionality is called gds.graph.networkx.load. It returns Graph object, and takes three arguments:

Table 3. NetworkX graph loading method arguments
Name Type

nx_G

networkx.Graph

A graph in the NetworX format

graph_name

str

The name of the created GDS graph

concurrency

int = 4

An optional number of threads to use

Exactly how the networkx.Graph nx_G maps to a GDS Graph projection is outlined in detail below.

4.1. Example

Let’s look at an example of loading a minimal heterogeneous toy NetworkX graph.

Example of loading a directed NetworkX graph
import networkx as nx

# Construct a directed NetworkX graph
nx_G = nx.DiGraph()
nx_G.add_node(1, labels=["Person"], age=52)
nx_G.add_node(42, labels=["Product", "Item"], cost=17.2)
nx_G.add_edge(1, 42, relationshipType="BUYS", quantity=4)

# Load the graph into GDS
G = gds.graph.networkx.load(nx_G, "purchases")

# Verify that the projection is what we expect
assert G.name() == "purchases"
assert G.node_count() == 2
assert set(G.node_labels()) == {"Person", "Product", "Item"}
assert G.node_properties("Person") == ["age"]
assert G.node_properties("Product") == ["cost"]
# Count rel not being = 2 indicates the graph is indeed directed
assert G.relationship_count() == 1
assert G.relationship_types() == ["BUYS"]
assert G.relationship_properties("BUYS") == ["quantity"]

Combined with NetworkX’s functionality for reading various graph formats one can easily load popular graph formats like edge list and GML into GDS.

Combined with NetworkX’s functionality for generating various kinds of graphs one can easily load graph types popular in the literature into GDS, such as expanders, lollipop graphs, complete graphs, and more.

4.2. NetworkX schema to GDS schema

There are some rules as to how the NetworkX graph maps to the projected Graph in GDS. They follow principles similar to those of Constructing a graph from DataFrames, and are outlined in detail in this section.

4.2.1. Node labels

Node labels for the projected GDS graph are taken from attributes on networkx.Graph nodes. The values of node attribute key labels will dictate what labels nodes are given in the projection. These values should be either strings or lists of strings.

Either all nodes of the networkx.Graph must have valid labels attributes, or they can be completely left out from the graph. That is, a networx.Graph with no labels node attributes at all is also allowed. In this latter case, the nodes in the projected Graph will all have the node label N.

4.2.2. Node properties

Node properties for the projected GDS graph are taken from attributes on networkx.Graph nodes. The keys of the attributes will map to property names, and the allowed values must follow the regular guidelines for node property values in GDS. Please note though that the node attribute key labels is reserved for node labels (previous section) and will not translate to node properties in the projection.

4.2.3. Relationship types

Relationship types for the projected GDS graph are taken from attributes on networkx.Graph edges. The values of edge attribute key relationshipType will dictate what types relationships are given in the projection. These values should be either strings or omitted.

Either all edges of the networkx.Graph must have valid relationshipType attributes, or they can be completely left out from the graph. That is, a networx.Graph with no relationshipType edge attributes at all is also allowed. In this latter case, the relationships in the projected Graph will all have the relationship type R.

4.2.4. Relationship properties

Relationship properties for the projected GDS graph are taken from attributes on networkx.Graph edges. The keys of the attributes will map to property names, and the allowed values must follow the regular guidelines for relationship property values in GDS. Please note though that the edge attribute key relationshipType is reserved for relationship types (previous section) and will not translate to relationship properties in the projection.

4.2.5. Relationship direction

The direction (DIRECTED or UNDIRECTED) of the relationships in the projected GDS graph are inferred from the type of networkx.Graph used. If the given NetworkX graph is directed, so a (sub)class of either networkx.DiGraph or networkx.MultiDiGraph, the relationships will be DIRECTED in the projection. Otherwise, they will be UNDIRECTED.

4.3. Limitations on community edition

For users of GDS community edition, performance can be impacted for large graphs. It is possible that socket connection with the database times out. If this happens, a possible workaround is to modify the server configuration server.bolt.connection_keep_alive or server.bolt.connection_keep_alive_probes. However, be aware of the side effects such as a genuine connection issue now taking longer to be detected.

5. Inspecting a graph object

There are convenience methods on the graph object that let us extract information about our projected graph.

Table 4. Graph object methods
Name Arguments Return type Description

name

-

str

The name of the projected graph.

database

-

str

Name of the database in which the graph has been projected.

node_count

-

int

The node count of the projected graph.

relationship_count

-

int

The relationship count of the projected graph.

node_labels

-

list[str]

A list of the node labels present in the graph.

relationship_types

-

list[str]

A list of the relationship types present in the graph.

node_properties

label: Optional[str]

Union[Series, list[str]]

If label argument given, returns a list of the properties present on the nodes with the provided node label. Otherwise, returns a Series mapping every node label to a list of the properties present on nodes with that label.

relationship_properties

type: Optional[str]

Union[Series, list[str]]

If type argument given, returns a list of the properties present on the relationships with the provided relationship type. Otherwise, returns a Series mapping every relationship type to a list of the properties present on relationships with that type.

degree_distribution

-

Series

The average out-degree of generated nodes.

density

-

float

Density of the graph.

size_in_bytes

-

int

Number of bytes used in the Java heap to store the graph.

memory_usage

-

str

Human-readable description of size_in_bytes.

exists

-

bool

Returns True if the graph exists in the GDS Graph Catalog, otherwise False.

drop

failIfMissing: Optional[bool]

Series

Removes the graph from the GDS Graph Catalog.

configuration

-

Series

The configuration used to project the graph in memory.

creation_time

-

neo4j.time.Datetime

Time when the graph was projected.

modification_time

-

neo4j.time.Datetime

Time when the graph was last modified.

For example, to get the node count and node properties of a graph G, we would do the following:

n = G.node_count()
props = G.node_properties("City")

6. Context management

The graph object also implement the context managment protocol, i.e., is usable inside with clauses. On exiting the with block, the graph projection will be automatically dropped on the server side.

# We use the example graph from the `Projecting a graph object` section
with gds.graph.project(
    "tmp_offices",              #  Graph name
    ["City"],                   #  Node projection
    "FLY_TO",                   #  Relationship projection
    readConcurrency=4           #  Configuration parameters
)[0] as G_tmp:
    assert G_tmp.exists()

# Outside of the with block the Graph does not exist
assert not gds.graph.exists("tmp_offices")["exists"]

7. Using a graph object

The primary use case for a graph object is to pass it to algorithms, but it’s also the input to most methods of the GDS Graph Catalog.

7.1. Input to algorithms

The Python client syntax for using a Graph as input to an algorithm follows the GDS Cypher procedure API, where the graph is the first parameter passed to the algorithm.

Syntax composition:
result = gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
  G: Graph,
  **configuration: dict[str, any]
)

In this example we run the degree centrality algorithm on a graph G:

result = gds.degree.mutate(G, mutateProperty="degree")
assert "centralityDistribution" in result

7.2. The graph catalog

All procedures of the GDS Graph Catalog have corresponding Python methods in the client. Of those catalog procedures that take a graph name string as input, their Python client equivalents instead take a Graph object, with the exception of gds.graph.exists which still takes a graph name string.

Below are some examples of how the GDS Graph Catalog can be used via the client, assuming we inspect the graph G from the example above:

# List graphs in the catalog
list_result = gds.graph.list()

# Check for existence of a graph in the catalog
exists_result = gds.graph.exists("offices")
assert exists_result["exists"]

# Stream the node property 'degree'
result = gds.graph.nodeProperty.stream(G, node_property="degree")

# Drop a graph; same as G.drop()
gds.graph.drop(G)

7.2.1. Streaming properties

The client methods

are greatly sped up if Apache Arrow Flight Server of GDS is enabled.

Additionally, setting the client only optional keyword parameter separate_property_columns=True (it defaults to False) for gds.graph.streamNodeProperties and gds.graph.streamRelationshipProperties returns a pandas DataFrame in which each property requested has its own column. Note that this is different from the default behavior for which there would only be one column called propertyValue that contains all properties requested interleaved for each node or relationship.

7.2.2. Including node properties from Neo4j

Node properties such as names and descriptions are useful to understand the output of an algorithm, even if not needed to run the algorithm itself. To fetch additional node properties directly from the Neo4j database, you can use the db_node_properties client-only parameter of the gds.graph.nodeProperty.stream and gds.graph.nodeProperties.stream methods.

In the following example, the City nodes have both a numeric and a String property. The stream method retrieves the values of the database-only name property alongside the values of the projected population property.

gds.run_cypher(
  """
  CREATE
    (m: City {name: "Malmö", population: 360000}),
    (l: City {name: "London", population: 8800000}),
    (s: City {name: "San Mateo", population: 105000}),

    (m)-[:FLY_TO]->(l),
    (l)-[:FLY_TO]->(m),
    (l)-[:FLY_TO]->(s),
    (s)-[:FLY_TO]->(l)
  """
)

G, result = gds.graph.project(
    "offices",
    {
        "City": {
            "properties": ["population"]
        }
    },
    "FLY_TO"
)

gds.graph.nodeProperties.stream(G, node_properties=["population"], db_node_properties=["name"])

7.2.3. Streaming topology by relationship type

The type returned from the Python client method corresponding to gds.beta.graph.relationships.stream is called TopologyDataFrame and inherits from the standard pandas DataFrame. TopologyDataFrame comes with an additional convenience method named by_rel_type which takes no arguments, and returns a dictionary of the form Dict[str, List[List[int]]]. This dictionary maps relationship types as strings to 2 x m matrices where m here represents the number of relationhips of the given type. The first row of each such matrix are the source node ids of the relationships, and the second row are the corresponding target node ids.

We can illustrate this transformation with an example using our graph G from the contruct example above:

topology_by_rel_type = gds.beta.graph.relationships.stream(G).by_rel_type()

assert list(topology_by_rel_type.keys()) == ["REL"]
assert topology_by_rel_type["REL"][0] == [0, 1, 2, 3]
assert topology_by_rel_type["REL"][1] == [1, 2, 3, 0]

Like the Streaming properties methods, the gds.beta.graph.relationships.stream is also accelerated if the GDS Apache Arrow Flight Server is enabled.