Creating graphs
You can create a GDS graph from any of the following data sources:
-
A Neo4j database, using a Cypher projection, or native projection (for ease of use, but reduced flexibility)
-
An external source, via an Apache Arrow connection
-
Random data (for testing purposes)
Furthermore, the Python client provides several convenient methods to create graphs, for example from Pandas DataFrames or some well-known datasets.
Graph data model
The following describes the information that can be associated to the graphs nodes and relationships.
Nodes
Labels
A node can have zero or more labels. Labels are represented as Strings. The label can be used to filter the graph on usage, for example, to only run an algorithm on a subset of the nodes by specifying the nodeLabels parameter.
If an algorithm can distinguish between different relationship types, this is indicated by the Heterogeneous relationships trait in its documentation.
Node Properties
The Neo4j Graph Data Science Library is capable of augmenting nodes with additional properties.
These properties can be loaded from the database when the graph is projected.
Many algorithms can also persist their result as one or more node properties when they are run using the mutate
mode.
Supported types
The Neo4j Graph Data Science library does not support all property types that are supported by the Neo4j database. Every supported type also defines a fallback value, which is used to indicate that the value of this property is not set.
The following table lists the supported property types, as well as their corresponding fallback values.
Java Type | Cypher Type | Precision | Fallback value |
---|---|---|---|
Long |
Integer |
64 bit signed |
|
Double |
Float |
64 bit signed |
|
List of Long |
List of Integer |
- |
|
List of Double |
List of Float |
- |
|
1. Value of -2^63 |
Defining the type of a node property
When creating a graph projection that specifies a set of node properties, the type of these properties is automatically determined using the first property value that is read by the loader for any specified property.
All integral numerical types are interpreted as Long
values, all floating point values are interpreted as Double
values.
List values are explicitly defined by the type of the values that the array contains, for example converting a List of Integer
into a List of Long
is not supported.
Lists with mixed content types are not supported.
Automatic type conversion
Most algorithms that are capable of using node properties require a specific property type. In cases of a mismatch between the type of the provided property and the required type, the library will try to convert the property value into the required type.
The automatic conversion only happens when the conversion is loss-less. Hence, we check the following:
-
Long
toDouble
: The Long value does not exceed the supported range of the Double type. -
Double
toLong
: The Double value does not have any decimal places. -
Double[]
toFloat[]
: The Double values do not exceed the supported range of the Float type for any of the elements in the array.
The algorithm computation will fail if any of these conditions are not satisfied for any node property value.
The automatic conversion is computationally more expensive and should therefore be avoided in performance critical applications. |
Relationships
Relationships in GDS can be either directed or undirected. Also, we supported to have multiple relationships between two nodes as well as self loops. Whether you should create directed or undirected relationships depends on the semantics of the relationship as well as algorithm you want to run.
Type
A relationship has a type, which is represented as a String. The type can be used to filter the graph on usage, for example, to only run an algorithm on a subset of the relationships by specifying the relationshipTypes parameter. If an algorithm can distinguish between different relationship types, this is indicated by the Heterogeneous relationships trait in its documentation.
Direction
If an algorithm supports directed or undirected relationships can be seen by the Undirected trait and Directed trait in its documentation.
Properties
The Neo4j Graph Data Science library does not support all property types that are supported by the Neo4j database.
Specifically, GDS only supports numeric relationship properties, i.e., Long
, Double
.
Note that Long
will be converted to Double
during the projection.
If an algorithm supports relationship properties can be seen by the Weighted relationships trait in its documentation.
Relationship IDs
Relationships within a projected GDS graph are only identified by their source and target nodes. The Neo4j relationship ID is not projected and accessible by GDS. In order to access the ID of an relationship, it can be added as a property to the relationship projection.
MATCH (source)-[r]->(target)
RETURN gds.graph.project(
'graph', source, target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
relationshipType: type(r),
relationshipProperties: { relationship_id: id(r) } // (1)
}
)
1 | The relationship id is added as a property to the projected graph. |