Apache Arrow operations
This feature is in the alpha tier. For more information on feature tiers, see API Tiers.
The graphs in the Neo4j Graph Data Science Library support properties for nodes and relationships. One way to export those properties is using Cypher procedures. Those are documented in Node operations and Relationship operations. Similar to the procedures, GDS also supports exporting properties via Arrow Flight.
In this chapter, we assume that a Flight server has been set up and configured. To learn more about the installation, please refer to the installation chapter.
1. Arrow Ticket format
Flight streams to read properties from an in-memory graph are initiated by the Arrow client by calling the GET
function and providing a Flight ticket.
The general idea is to mirror the behaviour of the procedures for streaming properties from the in-memory graph.
To identify the graph and the procedure that we want to mirror, the ticket must contain the following keys:
Name | Type | Description |
---|---|---|
|
String |
The name of the graph in the graph catalog. |
|
String |
The database the graph is associated with. |
|
String |
The mirrored property stream procedure. |
|
Map |
The procedure specific configuration. |
The following image shows the client-server interaction for exporting data using node property streaming as an example.

2. Stream a single node property
To stream a single node property, the client needs to encode that information in the ticket as follows:
{ graph_name: "my_graph", database_name: "database_name", procedure_name: "gds.graph.nodeProperty.stream", configuration: { node_labels: ["*"], node_property: "foo" } }
The procedure_name
indicates that we mirror the behaviour of the existing procedure.
The specific configuration needs to include the following keys:
Name | Type | Description |
---|---|---|
|
String or List of Strings |
Stream only properties for nodes with the given labels. |
|
String |
The node property in the graph to stream. |
The schema of the result records is identical to the corresponding procedure:
Name | Type | Description |
---|---|---|
nodeId |
Integer |
The id of the node. |
propertyValue |
|
The stored property value. |
3. Stream multiple node properties
To stream multiple node properties, the client needs to encode that information in the ticket as follows:
{ graph_name: "my_graph", database_name: "database_name", procedure_name: "gds.graph.streamNodeProperties", configuration: { node_labels: ["*"], node_properties: ["foo", "bar", "baz"] } }
The procedure_name
indicates that we mirror the behaviour of the existing procedure.
The specific configuration needs to include the following keys:
Name | Type | Description |
---|---|---|
|
String or List of Strings |
Stream only properties for nodes with the given labels. |
|
String or List of Strings |
The node properties in the graph to stream. |
Note that the schema of the result records is not identical to the corresponding procedure. Instead of a separate column containing the property key, every property is returned in its own column. As a result, there is only one row per node which includes all its property values.
For example, given the node (a { foo: 42, bar: 1337, baz: [1,3,3,7] })
and assuming node id 0
for a
, the resulting record schema is as follows:
nodeId | foo | bar | baz |
---|---|---|---|
0 |
42 |
1337 |
[1,3,3,7] |
4. Stream a single relationship property
To stream a single relationship property, the client needs to encode that information in the ticket as follows:
{ graph_name: "my_graph", database_name: "database_name", procedure_name: "gds.graph.relationshipProperty.stream", configuration: { relationship_types: "REL", relationship_property: "foo" } }
The procedure_name
indicates that we mirror the behaviour of the existing procedure.
The specific configuration needs to include the following keys:
Name | Type | Description |
---|---|---|
|
String or List of Strings |
Stream only properties for relationships with the given type. |
|
String |
The relationship property in the graph to stream. |
The schema of the result records is identical to the corresponding procedure:
Name | Type | Description |
---|---|---|
sourceNodeId |
Integer |
The source node id of the relationship. |
targetNodeId |
Integer |
The target node id of the relationship. |
relationshipType |
Integer |
Dictionary-encoded relationship type. |
propertyValue |
Float |
The stored property value. |
Note, that the relationship type column stores the relationship type encoded as an integer. The corresponding string value needs to be retrieved from the corresponding dictionary value vector. That vector can be loaded from the dictionary provider using the encoding id of the type field.
5. Stream multiple relationship properties
To stream multiple relationship properties, the client needs to encode that information in the ticket as follows:
{ graph_name: "my_graph", database_name: "database_name", procedure_name: "gds.graph.relationshipProperties.stream", configuration: { relationship_types: "REL", relationship_property: ["foo", "bar"] } }
The procedure_name
indicates that we mirror the behaviour of the existing procedure.
The specific configuration needs to include the following keys:
Name | Type | Description |
---|---|---|
|
String or List of Strings |
Stream only properties for relationships with the given type. |
|
String or List of String |
The relationship properties in the graph to stream. |
Note that the schema of the result records is not identical to the corresponding procedure. Instead of a separate column containing the property key, every property is returned in its own column. As a result, there is only one row per relationship which includes all its property values.
For example, given the relationship [:REL { foo: 42.0, bar: 13.37 }]
that connects a source node with id 0
wit a target node with id 1
, the resulting record schema is as follows:
sourceNodeId | targetNodeId | relationshipType | foo | bar |
---|---|---|---|---|
0 |
1 |
0 |
42.0 |
13.37 |
Note, that the relationship type column stores the relationship type encoded as an integer. The corresponding string value needs to be retrieved from the corresponding dictionary value vector. That vector can be loaded from the dictionary provider using the encoding id of the type field.
6. Stream relationship topology
To stream the topology of one or more relationship types, the client needs to encode that information in the ticket as follows:
{ graph_name: "my_graph", database_name: "database_name", procedure_name: "gds.graph.relationshipProperties.stream", configuration: { relationship_types: "REL" } }
The procedure_name
indicates that we mirror the behaviour of the existing procedure.
The specific configuration needs to include the following keys:
Name | Type | Description |
---|---|---|
|
String or List of Strings |
Stream only properties for relationships with the given type. |
The schema of the result records is identical to the corresponding procedure:
sourceNodeId | targetNodeId | relationshipType | 0 | 1 |
---|
Note, that the relationship type column stores the relationship type encoded as an integer. The corresponding string value needs to be retrieved from the corresponding dictionary value vector. That vector can be loaded from the dictionary provider using the encoding id of the type field.
Was this page helpful?