Apache Arrow operations

This feature is in the alpha tier. For more information on feature tiers, see Operations reference.

The graphs in the Neo4j Graph Data Science Library support properties for nodes and relationships. One way to export those properties is using Cypher procedures. Those are documented in Node operations and Relationship operations. Similar to the procedures, GDS also supports exporting properties via Arrow Flight.

In this chapter, we assume that a Flight server has been set up and configured. To learn more about the installation, please refer to the installation chapter.

1. Arrow Ticket format

Flight streams to read properties from an in-memory graph are initiated by the Arrow client by calling the GET function and providing a Flight ticket. The general idea is to mirror the behaviour of the procedures for streaming properties from the in-memory graph. To identify the graph and the procedure that we want to mirror, the ticket must contain the following keys:

Name Type Description

graph_name

String

The name of the graph in the graph catalog.

database_name

String

The database the graph is associated with.

procedure_name

String

The mirrored property stream procedure.

configuration

Map

The procedure specific configuration.

The following image shows the client-server interaction for exporting data using node property streaming as an example.

Client-server protocol for Arrow export in GDS

2. Stream a single node property

To stream a single node property, the client needs to encode that information in the ticket as follows:

{
    graph_name: "my_graph",
    database_name: "database_name",
    procedure_name: "gds.graph.nodeProperty.stream",
    configuration: {
        node_labels: ["*"],
        node_property: "foo"
    }
}

The procedure_name indicates that we mirror the behaviour of the existing procedure. The specific configuration needs to include the following keys:

Name Type Description

node_labels

String or List of Strings

Stream only properties for nodes with the given labels.

node_property

String

The node property in the graph to stream.

The schema of the result records is identical to the corresponding procedure:

Table 1. Results
Name Type Description

nodeId

Integer

The id of the node.

propertyValue

  • Integer

  • Float

  • List of Integer

  • List of Float

The stored property value.

3. Stream multiple node properties

To stream multiple node properties, the client needs to encode that information in the ticket as follows:

{
    graph_name: "my_graph",
    database_name: "database_name",
    procedure_name: "gds.graph.streamNodeProperties",
    configuration: {
        node_labels: ["*"],
        node_properties: ["foo", "bar", "baz"]
    }
}

The procedure_name indicates that we mirror the behaviour of the existing procedure. The specific configuration needs to include the following keys:

Name Type Description

node_labels

String or List of Strings

Stream only properties for nodes with the given labels.

node_properties

String or List of Strings

The node properties in the graph to stream.

Note that the schema of the result records is not identical to the corresponding procedure. Instead of a separate column containing the property key, every property is returned in its own column. As a result, there is only one row per node which includes all its property values.

For example, given the node (a { foo: 42, bar: 1337, baz: [1,3,3,7] }) and assuming node id 0 for a, the resulting record schema is as follows:

nodeId foo bar baz

0

42

1337

[1,3,3,7]

4. Stream a single relationship property

To stream a single relationship property, the client needs to encode that information in the ticket as follows:

{
    graph_name: "my_graph",
    database_name: "database_name",
    procedure_name: "gds.graph.relationshipProperty.stream",
    configuration: {
        relationship_types: "REL",
        relationship_property: "foo"
    }
}

The procedure_name indicates that we mirror the behaviour of the existing procedure. The specific configuration needs to include the following keys:

Name Type Description

relationship_types

String or List of Strings

Stream only properties for relationships with the given type.

relationship_property

String

The relationship property in the graph to stream.

The schema of the result records is identical to the corresponding procedure:

Table 2. Results
Name Type Description

sourceNodeId

Integer

The source node id of the relationship.

targetNodeId

Integer

The target node id of the relationship.

relationshipType

Integer

Dictionary-encoded relationship type.

propertyValue

Float

The stored property value.

Note, that the relationship type column stores the relationship type encoded as an integer. The corresponding string value needs to be retrieved from the corresponding dictionary value vector. That vector can be loaded from the dictionary provider using the encoding id of the type field.

5. Stream multiple relationship properties

To stream multiple relationship properties, the client needs to encode that information in the ticket as follows:

{
    graph_name: "my_graph",
    database_name: "database_name",
    procedure_name: "gds.graph.relationshipProperties.stream",
    configuration: {
        relationship_types: "REL",
        relationship_property: ["foo", "bar"]
    }
}

The procedure_name indicates that we mirror the behaviour of the existing procedure. The specific configuration needs to include the following keys:

Name Type Description

relationship_types

String or List of Strings

Stream only properties for relationships with the given type.

relationship_properties

String or List of String

The relationship properties in the graph to stream.

Note that the schema of the result records is not identical to the corresponding procedure. Instead of a separate column containing the property key, every property is returned in its own column. As a result, there is only one row per relationship which includes all its property values.

For example, given the relationship [:REL { foo: 42.0, bar: 13.37 }] that connects a source node with id 0 wit a target node with id 1, the resulting record schema is as follows:

Table 3. Results
sourceNodeId targetNodeId relationshipType foo bar

0

1

0

42.0

13.37

Note, that the relationship type column stores the relationship type encoded as an integer. The corresponding string value needs to be retrieved from the corresponding dictionary value vector. That vector can be loaded from the dictionary provider using the encoding id of the type field.

6. Stream relationship topology

To stream the topology of one or more relationship types, the client needs to encode that information in the ticket as follows:

{
    graph_name: "my_graph",
    database_name: "database_name",
    procedure_name: "gds.graph.relationshipProperties.stream",
    configuration: {
        relationship_types: "REL"
    }
}

The procedure_name indicates that we mirror the behaviour of the existing procedure. The specific configuration needs to include the following keys:

Name Type Description

relationship_types

String or List of Strings

Stream only properties for relationships with the given type.

The schema of the result records is identical to the corresponding procedure:

Table 4. Results
sourceNodeId targetNodeId relationshipType 0 1

Note, that the relationship type column stores the relationship type encoded as an integer. The corresponding string value needs to be retrieved from the corresponding dictionary value vector. That vector can be loaded from the dictionary provider using the encoding id of the type field.