The graph object

In order to utilize most the functionality in GDS, you must first project a graph into the GDS Graph Catalog. When projecting a graph with the Python client, a client-side reference to the projected graph is returned. We call these references Graph objects.

Once created, the Graph objects can be passed as arguments to other methods in the Python client, for example for running algorithms or training machine learning models. Additionally, the Graph objects have convenience methods allowing for inspection of the projected graph represented without explicitly involving the graph catalog.

In the examples below we assume that we have an instantiated GraphDataScience object called gds. Read more about this in Getting started.

1. Projecting a graph object

There are several ways of projecting a graph object. The simplest way is to do a native projection:

G, res = gds.graph.project(
    "my-graph",                 #  Graph name
    ["MyLabel", "YourLabel"],   #  Node projection
    "MY_REL_TYPE",              #  Relationship projection
    readConcurrency=4           #  Configuration parameters
)

where G is a Graph object, and res is a pandas Series containing metadata from the underlying procedure call.

Note that all projection syntax variants are supported by way of specifying a Python dict or list for the node and relationship projection arguments. To specify configuration parameters corresponding to the keys of the procedure’s configuration map, we give named keyword arguments, like for readConcurrency=4 above. Read more about this in Syntax.

Similarly to Cypher there’s also a corresponding gds.graph.project.estimate method that can be called in an analogous way.

To get a graph object that represents a graph that has already been projected into the graph catalog, one can call the client-side only get method and passing it a name:

G = gds.graph.get("my-graph")

In addition to those aforementioned there are four more methods that create graph objects:

  • gds.graph.project.cypher

  • gds.beta.graph.subgraph

  • gds.beta.graph.generate

  • gds.alpha.graph.sample.rwr

Their Cypher signatures map to Python in much the same way as gds.graph.project above.

2. Constructing a graph

This feature is in the alpha tier. For more information on feature tiers, see the GDS manual.

Instead of projecting a graph from the Neo4j database it is also possible to construct new graphs using pandas DataFrames from the client.

nodes = pandas.DataFrame(
    {
        "nodeId": [0, 1, 2, 3],
        "labels":  ["A", "B", "C", "A"],
        "prop1": [42, 1337, 8, 0],
        "otherProperty": [0.1, 0.2, 0.3, 0.4]
    }
)

relationships = pandas.DataFrame(
    {
        "sourceNodeId": [0, 1, 2, 3],
        "targetNodeId": [1, 2, 3, 0],
        "relationshipType": ["REL", "REL", "REL", "REL"],
        "weight": [0.0, 0.0, 0.1, 42.0]
    }
)

G = gds.alpha.graph.construct(
    "my-graph",      # Graph name
    nodes,           # One or more dataframes containing node data
    relationships    # One or more dataframes containing relationship data
)

The above example creates a simple graph using one node and one relationship DataFrame. The created graph is equivalent to a graph created by the following Cypher query:

CREATE
    (a:A {prop1: 42,    otherProperty: 0.1),
    (b:B {prop1: 1337,  otherProperty: 0.2),
    (c:C {prop1: 8,     otherProperty: 0.3),
    (d:A {prop1: 0,     otherProperty: 0.4),
    (a)-[:REL {weight: 0.0}]->(b),
    (b)-[:REL {weight: 0.0}]->(c),
    (c)-[:REL {weight: 0.1}]->(d),
    (d)-[:REL {weight: 42.0}]->(a),

The supported format for the node data frames is described in Arrow node schema and the format for the relationship data frames is described in Arrow relationship schema.

2.1. Arrow flight server support

The construct method can utilize the Arrow Flight Server of GDS if it’s enabled. This in particular means that:

  • The construction of the graph is greatly sped up,

  • It is possible to supply more than one data frame, both for nodes and relationships. If multiple node dataframes are used, they need to contain distinct node ids across all node data frames.

  • Prior to the construct call, a call to GraphDataScience.set_database must have been made to explicitly specify which Neo4j database should be targeted.

3. Inspecting a graph object

There are convenience methods on the graph object that let us extract information about our projected graph.

Table 1. Graph object methods
Name Arguments Return type Description

name

-

str

The name of the projected graph.

database

-

str

Name of the database in which the graph has been projected.

node_count

-

int

The node count of the projected graph.

relationship_count

-

int

The relationship count of the projected graph.

node_labels

-

list[str]

A list of the node labels present in the graph.

relationship_types

-

list[str]

A list of the relationship types present in the graph.

node_properties

label: Optional[str]

Union[Series, list[str]]

If label argument given, returns a list of the properties present on the nodes with the provided node label. Otherwise, returns a Series mapping every node label to a list of the properties present on nodes with that label.

relationship_properties

type: Optional[str]

Union[Series, list[str]]

If type argument given, returns a list of the properties present on the relationships with the provided relationship type. Otherwise, returns a Series mapping every relationship type to a list of the properties present on relationships with that type.

degree_distribution

-

Series

The average out-degree of generated nodes.

density

-

float

Density of the graph.

size_in_bytes

-

int

Number of bytes used in the Java heap to store the graph.

memory_usage

-

str

Human-readable description of size_in_bytes.

exists

-

bool

Returns True if the graph exists in the GDS Graph Catalog, otherwise False.

drop

failIfMissing: Optional[bool]

Series

Removes the graph from the GDS Graph Catalog.

configuration

-

Series

The configuration used to project the graph in memory.

creation_time

-

neo4j.time.Datetime

Time when the graph was projected.

modification_time

-

neo4j.time.Datetime

Time when the graph was last modified.

For example, to get the node count and node properties of a graph G, we would do the following:

n = G.node_count()
props = G.node_properties("MyLabel")

4. Using a graph object

The primary use case for a graph object is to pass it to algorithms, but it’s also the input to most methods of the GDS Graph Catalog.

4.1. Input to algorithms

The Python client syntax for using a Graph as input to an algorithm follows the GDS Cypher procedure API, where the graph is the first parameter passed to the algorithm.

Syntax composition:
result = gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
  G: Graph,
  **configuration: dict[str, any]
)

4.2. The graph catalog

All procedures of the GDS Graph Catalog have corresponding Python methods in the client. Of those catalog procedures that take a graph name string as input, their Python client equivalents instead take a Graph object, with the exception of gds.graph.exists which still takes a graph name string.

Below are some examples of how the GDS Graph Catalog can be used via the client:

G, project_result = gds.graph.project(...)

stream_result = gds.graph.streamNodeProperty(G, ...)

gds.graph.drop(G)  # same as G.drop()

4.2.1. Streaming properties

The client methods

are greatly sped up if Arrow Flight Server of GDS is enabled.

Additionally, setting the client only optional keyword parameter separate_property_columns=True (it defaults to False) for gds.graph.streamNodeProperties and gds.graph.streamRelationshipProperties returns a pandas DataFrame in which each property requested has its own column. Note that this is different from the default behavior for which there would only be one column called propertyValue that contains all properties requested interleaved for each node or relationship.