Creating graphs

You can create a GDS graph from any of the following data sources:

Furthermore, the Python client provides several convenient methods to create graphs, for example from Pandas DataFrames or some well-known datasets.

Graph data model

The following describes the information that can be associated to the graphs nodes and relationships.

Nodes

Labels

A node can have zero or more labels. Labels are represented as Strings. The label can be used to filter the graph on usage, for example, to only run an algorithm on a subset of the nodes by specifying the nodeLabels parameter.

If an algorithm can distinguish between different relationship types, this is indicated by the Heterogeneous relationships trait in its documentation.

Node Properties

The Neo4j Graph Data Science Library is capable of augmenting nodes with additional properties. These properties can be loaded from the database when the graph is projected. Many algorithms can also persist their result as one or more node properties when they are run using the mutate mode.

Supported types

The Neo4j Graph Data Science library does not support all property types that are supported by the Neo4j database. Every supported type also defines a fallback value, which is used to indicate that the value of this property is not set.

The following table lists the supported property types, as well as their corresponding fallback values.

Table 1. Types
Java Type Cypher Type Precision Fallback value

Long

Integer

64 bit signed

Long.MIN_VALUE [1]

Double

Float

64 bit signed

Double.NaN

List of Long

List of Integer

-

null

List of Double

List of Float

-

null

1. Value of -2^63

Defining the type of a node property

When creating a graph projection that specifies a set of node properties, the type of these properties is automatically determined using the first property value that is read by the loader for any specified property. All integral numerical types are interpreted as Long values, all floating point values are interpreted as Double values. List values are explicitly defined by the type of the values that the array contains, for example converting a List of Integer into a List of Long is not supported. Lists with mixed content types are not supported.

Automatic type conversion

Most algorithms that are capable of using node properties require a specific property type. In cases of a mismatch between the type of the provided property and the required type, the library will try to convert the property value into the required type.

The automatic conversion only happens when the conversion is loss-less. Hence, we check the following:

  • Long to Double: The Long value does not exceed the supported range of the Double type.

  • Double to Long: The Double value does not have any decimal places.

  • Double[] to Float[]: The Double values do not exceed the supported range of the Float type for any of the elements in the array.

The algorithm computation will fail if any of these conditions are not satisfied for any node property value.

The automatic conversion is computationally more expensive and should therefore be avoided in performance critical applications.

Relationships

Relationships in GDS can be either directed or undirected. Also, we supported to have multiple relationships between two nodes as well as self loops. Whether you should create directed or undirected relationships depends on the semantics of the relationship as well as algorithm you want to run.

Type

A relationship has a type, which is represented as a String. The type can be used to filter the graph on usage, for example, to only run an algorithm on a subset of the relationships by specifying the relationshipTypes parameter. If an algorithm can distinguish between different relationship types, this is indicated by the Heterogeneous relationships trait in its documentation.

Direction

If an algorithm supports directed or undirected relationships can be seen by the Undirected trait and Directed trait in its documentation.

Properties

The Neo4j Graph Data Science library does not support all property types that are supported by the Neo4j database. Specifically, GDS only supports numeric relationship properties, i.e., Long, Double. Note that Long will be converted to Double during the projection.

If an algorithm supports relationship properties can be seen by the Weighted relationships trait in its documentation.

Relationship IDs

Relationships within a projected GDS graph are only identified by their source and target nodes. The Neo4j relationship ID is not projected and accessible by GDS. In order to access the ID of an relationship, it can be added as a property to the relationship projection.

The following statement will use Cypher projection to project a graph with the relationship id as a property.
MATCH (source)-[r]->(target)
RETURN gds.graph.project(
    'graph', source, target,
    {
         sourceNodeLabels: labels(source),
         targetNodeLabels: labels(target),
         relationshipType: type(r),
         relationshipProperties: { relationship_id: id(r) }  //  (1)
    }
)
1 The relationship id is added as a property to the projected graph.