Aura Graph Analytics Serverless

Aura Graph Analytics Serverless is an on-demand ephemeral compute environment for running GDS workloads. Each compute unit is called a GDS Session. It is offered as part of Neo4j Aura, a fast, scalable, always-on, fully automated cloud graph platform.

There are three types of GDS Sessions:

Attached: the data source is a Neo4j AuraDB instance.
Self-managed: the data source is a self-managed Neo4j DBMS.
Standalone: the data source is not based on Neo4j.

The process of populating the session with data is called remote projection. Once populated, a GDS Session can run GDS workloads, such as algorithms and machine learning models. Results from these computations can be written back to the original source, using remote write-back in the Attached and Self-managed types.

For ready-to-run notebooks, see our tutorials on GDS Sessions for AuraDB, self-managed databases, and any other data source.

1. GDS Session management

The GdsSessions object is the API entry point to the following operations:

get_or_create: Create a new GDS Session, or connect to an existing one.
list: List all currently active GDS Sessions.
delete: Delete a GDS Session.

You need Neo4j Aura API credentials (CLIENT_ID and CLIENT_SECRET) to create a GdsSessions object. See the Aura documentation for instructions on how to create API credentials from your Neo4j Aura account. If your Aura user is part of multiple projects, the desired project ID must also be provided.

Creating a GdsSessions object:

from graphdatascience.session import GdsSessions, AuraAPICredentials

CLIENT_ID = "my-aura-api-client-id"
CLIENT_SECRET = "my-aura-api-client-secret"
PROJECT_ID = None

# Create a new GdsSessions object
sessions = GdsSessions(api_credentials=AuraAPICredentials(CLIENT_ID, CLIENT_SECRET, PROJECT_ID))

All available methods and parameters are listed in the API reference.

1.1. Creating a GDS Session

To create a GDS Session, use the get_or_create() method. It will create a new session if it does not exist, or connect to an existing one if it does. If the session options differ from the existing one, an error is thrown.

The return value of get_or_create() is an AuraGraphDataScience object. It offers a similar API to the GraphDataScience object, but it is configured to run on a GDS Session. As a convention, always use the variable name gds for the return value of get_or_create().

1.1.1. Session expiration and deletion

When the session is created, an optional ttl parameter can be configured to set the time after which an inactive session will expire. The default value for ttl is 1 hour and the maximum allowed value is 7 days. An expired session cannot be used to run workloads, does not cost anything, and will be deleted automatically after 7 days. It can also be deleted through the Aura Console UI.

1.1.2. Maximum lifetime

A session can never be kept active for more than 7 days. Even if the session does not expire due to inactivity, it will still expire 7 days after its creation. This is a hard limit and cannot be changed.

1.1.3. Syntax

sessions.get_or_create(
    session_name: str,
    memory: SessionMemory,
    db_connection: Optional[DbmsConnectionInfo] = None,
    ttl: Optional[timedelta] = None,
    cloud_location: Optional[CloudLocation] = None,
    timeout: Optional[int] = None,
): AuraGraphDataScience

Table 1. Parameters:
Name	Type	Optional	Default	Description
`session_name`	`str`	no	`-`	Name of the session. Must be unique within the project.
`memory`	`SessionMemory`	no	`-`	Amount of memory available to the session.
`db_connection`	`DbmsConnectionInfo`	yes	`None`	Bolt server URL, username, and password to a Neo4j DBMS. Required for the Attached and Self-managed types. Alternatively to username and password, you can provide a `neo4j.Auth` object.
`ttl`	`datetime.timedelta`	yes	`1h`	Time-to-live for the session.
`cloud_location`	`CloudLocation`	yes	`None`	Aura-supported cloud provider and region where the GDS Session will run. Required for the Self-managed and Standalone types.
`timeout`	`int`	yes	`None`	Seconds to wait for the session to enter Ready state. If the time is exceeded, an error will be returned.

1.1.4. Examples

Creating a GDS Session attached to an AuraDB instance:

from graphdatascience.session import DbmsConnectionInfo, SessionMemory

gds = sessions.get_or_create(
    session_name="my-attached-session",
    memory=SessionMemory.m_4GB,
    db_connection=DbmsConnectionInfo(
        "neo4j+s://mydbid.databases.neo4j.io",
        "my-user",
        "my-password"
    ),
)

Creating a GDS Session for a self-managed Neo4j DBMS:

from graphdatascience.session import DbmsConnectionInfo, CloudLocation, SessionMemory

gds = sessions.get_or_create(
    session_name="my-self-managed-session",
    memory=SessionMemory.m_4GB,
    db_connection=DbmsConnectionInfo("neo4j://localhost", "my-user", "my-password"),
    cloud_location=CloudLocation(provider="gcp", region="europe-west1"),
)

Creating a GDS Session without any Neo4j database:

from graphdatascience.session import CloudLocation, SessionMemory

gds = sessions.get_or_create(
    session_name="my-standalone-session",
    memory=SessionMemory.m_4GB,
    cloud_location=CloudLocation(provider="gcp", region="europe-west1"),
)

1.2. Listing GDS Sessions

The list() method returns the name and size of memory of all currently active GDS Sessions.

Listing GDS Sessions:

sessions.list()

1.3. Deleting a GDS Session

Deleting a GDS Session will terminate the session and stop any running costs from accumulating further. Deleting a session will not affect the configured Neo4j data source. However, any data not written back to the Neo4j instance will be lost.

If you have an open connection to the session:

Deleting a GDS Session via an open client connection:

gds.delete()

Use the delete() method to delete a GDS Session.

Deleting a GDS Session via the GdsSessions object:

sessions.delete(session_name="my-new-session")

1.4. Estimating session memory

In order to help determine a good session size for a given workload, there is the estimate() function. By providing expected node and relationship counts as well as algorithm categories that should be used, it will return an estimated size of the session.

Estimating the size of a GDS Session via the GdsSessions object:

from graphdatascience.session import AlgorithmCategory

memory = sessions.estimate(
    node_count=20,
    relationship_count=50,
    algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)

2. Projecting graphs into a GDS Session

Once you have a GDS Session, you can project a graph into it. This operation is called remote projection because the data source is not a co-located database, but rather a remote one.

You can create a remote projection using the gds.graph.project() endpoint with a graph name, a Cypher query, and additional optional parameters. The Cypher query must contain the gds.graph.project.remote() function to project the graph into the GDS Session. This is only possible to do with Attached and self-managed sessions. Standalone sessions must use graph.construct.

2.1. Syntax

Remote projection:

gds.graph.project(
    graph_name: str,
    query: str,
    job_id: Optional[str] = None,
    concurrency: int = 4,
    undirected_relationship_types: Optional[list[str]] = None,
    inverse_indexed_relationship_types: Optional[list[str]] = None,
    batch_size: Optional[int] = None,
): (Graph, Series[Any])

Table 2. Parameters:
Name	Type	Optional	Default	Description
`graph_name`	`str`	no	`-`	Name of the graph.
`query`	`str`	no	`-`	Projection query.
`job_id`	`str`	yes	`None`	Correlation id for the process on the session. If not provided an automatically generated id will be used.
`concurrency`	`int`	yes	`4`	Concurrency to use for building the graph within the session.
`undirected_relationship_types`	`list[str]`	yes	`[]`	List of relationship type names that should be treated as undirected.
`inverse_indexed_relationship_types`	`list[str]`	yes	`[]`	List of relationship type names that should be indexed in reverse.
`batch_size`	`int`	yes	`10000`	Size of batches transmitted from the DBMS to the session.

Table 3. Results:
Name	Type	Description
`graph`	`Graph`	Graph object representing the projected graph.
`result`	`Series[Any]`	Statistical data about the projection.

The concurrency and batch_size configuration parameters can be used to tune the performance of the remote projection.

The concurrency of the remote projection query is controlled by the Cypher runtime on the DBMS server. Use CYPHER runtime=parallel as a query prefix to maximise performance. The actual concurrency used depends on the DBMS server’s available processors and current operational load.

2.1.1. Remote projection query syntax

The remote projection query supports the same syntax as a Cypher projection, with two key differences:

The graph name is not a parameter. Instead, the graph name is provided to the gds.graph.project() endpoint.
The gds.graph.project.remote() function must be used, instead of the gds.graph.project() function.

For full details and examples on how to write Cypher projection queries, see the Cypher projection documentation in the GDS Manual.

2.1.2. Relationship type undirectedness and inverse indexing

The optional parameters undirectedRelationshipTypes and inverseIndexedRelationshipTypes are used to configure undirectedness and inverse indexing of relationships. These have the same behavior as documented in the GDS Manual.

2.2. Example

This example shows how to project a graph into a GDS Session. The example graph is heterogeneous and models users and products. Users can know each other, and users can buy products.

The Attached and Self-managed examples use a Cypher query to populate the database with the data. The Standalone example uses pandas DataFrames instead.

Create some data in the Neo4j DBMS and project it to an Attached GDS Session:

import os # for reading environment variables
from graphdatascience.session import SessionMemory, DbmsConnectionInfo, GdsSessions, AuraAPICredentials

sessions = GdsSessions(api_credentials=AuraAPICredentials(os.environ["CLIENT_ID"], os.environ["CLIENT_SECRET"]))

db_connection = DbmsConnectionInfo(os.environ["DB_URI"], os.environ["DB_USER"], os.environ["DB_PASSWORD"])
gds = sessions.get_or_create(
    session_name="my-new-session",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
)

gds.run_cypher(
    """
    CREATE
     (u1:User {name: 'Mats'}),
     (u2:User {name: 'Florentin'}),
     (p1:Product {name: 'ice cream', cost: 4.2}),
     (p2:Product {name: 'computer', cost: 13.37})

    CREATE
     (u1)-[:KNOWS {since: 2020}]->(u2),
     (u2)-[:BOUGHT {price: 7474}]->(p1),
     (u1)-[:BOUGHT {price: 1337}]->(p2)
    """
)

G, result = gds.graph.project(
    graph_name="my-graph",
    query="""
    CALL {
        MATCH (u1:User)
        OPTIONAL MATCH (u1)-[r:KNOWS]->(u2:User)
        RETURN u1 AS source, r AS rel, u2 AS target, {} AS sourceNodeProperties, {} AS targetNodeProperties
        UNION
        MATCH (p:Product)
        OPTIONAL MATCH (p)<-[r:BOUGHT]-(user:User)
        RETURN user AS source, r AS rel, p AS target, {} AS sourceNodeProperties, {cost: p.cost} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
      sourceNodeProperties: sourceNodeProperties,
      targetNodeProperties: targetNodeProperties,
      sourceNodeLabels: labels(source),
      targetNodeLabels: labels(target),
      relationshipType: type(rel),
      relationshipProperties: properties(rel)
    })
    """,
)

Create some data in the Neo4j DBMS and project it to a Self-managed GDS Session:

import os # for reading environment variables
from graphdatascience.session import SessionMemory, DbmsConnectionInfo, GdsSessions, AuraAPICredentials, CloudLocation

sessions = GdsSessions(api_credentials=AuraAPICredentials(os.environ["CLIENT_ID"], os.environ["CLIENT_SECRET"]))

db_connection = DbmsConnectionInfo(os.environ["DB_URI"], os.environ["DB_USER"], os.environ["DB_PASSWORD"])
gds = sessions.get_or_create(
    session_name="my-new-session",
    memory=SessionMemory.m_8GB,
    db_connection=db_connection,
    cloud_location=CloudLocation(provider="gcp", region="europe-west1"),
)

gds.run_cypher(
    """
    CREATE
     (u1:User {name: 'Mats'}),
     (u2:User {name: 'Florentin'}),
     (p1:Product {name: 'ice cream', cost: 4.2}),
     (p2:Product {name: 'computer', cost: 13.37})

    CREATE
     (u1)-[:KNOWS {since: 2020}]->(u2),
     (u2)-[:BOUGHT {price: 7474}]->(p1),
     (u1)-[:BOUGHT {price: 1337}]->(p2)
    """
)

G, result = gds.graph.project(
    graph_name="my-graph",
    query="""
    CALL {
        MATCH (u1:User)
        OPTIONAL MATCH (u1)-[r:KNOWS]->(u2:User)
        RETURN u1 AS source, r AS rel, u2 AS target, {} AS sourceNodeProperties, {} AS targetNodeProperties
        UNION
        MATCH (p:Product)
        OPTIONAL MATCH (p)<-[r:BOUGHT]-(user:User)
        RETURN user AS source, r AS rel, p AS target, {} AS sourceNodeProperties, {cost: p.cost} AS targetNodeProperties
    }
    RETURN gds.graph.project.remote(source, target, {
      sourceNodeProperties: sourceNodeProperties,
      targetNodeProperties: targetNodeProperties,
      sourceNodeLabels: labels(source),
      targetNodeLabels: labels(target),
      relationshipType: type(rel),
      relationshipProperties: properties(rel)
    })
    """,
)

Project some data to a Standalone GDS Session:

from graphdatascience.session import CloudLocation, SessionMemory

gds = sessions.get_or_create(
    session_name="my-standalone-session",
    memory=SessionMemory.m_4GB,
    cloud_location=CloudLocation(provider="gcp", region="europe-west1"),
)

nodes = [pandas.DataFrame({
        "nodeId": [0, 1],
        "labels":  ["Person", "Person"],
    }), pandas.DataFrame({
        "nodeId": [2, 3],
        "labels":  ["Product", "Product"],
        "cost": [4.2, 13.37],
    })
]

relationships = [pandas.DataFrame({
        "sourceNodeId": [0],
        "targetNodeId": [1],
        "relationshipType": ["KNOWS"],
        "since": [2020]
    }), pandas.DataFrame({
        "sourceNodeId": [0, 1],
        "targetNodeId": [3, 2],
        "relationshipType": ["BOUGHT", "BOUGHT"],
        "price": [1337, 7474]
    })
]

G = gds.graph.construct(
    "my-graph",
    nodes,
    relationships
)

3. Running algorithms

You can run algorithms on a remotely projected graph in the same way you would on any projected graph. For instance, you can run the PageRank and FastRP algorithms on the projected graph from the previous example as follows:

Run algorithms and stream back results:

gds.pageRank.mutate(G, mutateProperty="pr")
gds.fastRP.mutate(G, featureProperties=["pr"], embeddingDimension=2, nodeSelfInfluence=0.1, mutateProperty="embedding")

# Stream the results back together with the `name` property fetched from the database
gds.graph.nodeProperties.stream(G, db_node_properties=["name"], node_properties=["pr", "embedding"])

For a full list of the available algorithms, see the API reference.

3.1. Limitations

Model Catalog is supported with limitations:
- Trained models can only be used for prediction using the same Session in which they were trained. After the Session is deleted, all trained models will be lost.
- Model publishing is not supported, including
  - gds.model.publish
- Model persistence is not supported, including
  - gds.model.store
  - gds.model.load
  - gds.model.delete
Topological Link Prediction algorithms are not supported, including
- gds.alpha.linkprediction.adamicAdar
- gds.alpha.linkprediction.commonNeighbors
- gds.alpha.linkprediction.preferentialAttachment
- gds.alpha.linkprediction.resourceAllocation
- gds.alpha.linkprediction.sameCommunity
- gds.alpha.linkprediction.totalNeighbors

4. Remote write-back

Persisting the results of a computation done in a GDS Session differs by the session’s type. Attached and Self-managed sessions come with built-in support for writing back algorithms results to the same Neo4j DB where the graph was projected from. Users of Standalone sessions have to stream the results back to the client and the user has to persist it in their target system. This section will illustrate the built-in remote write-back capability.

By default, write back will happen concurrently, in one transaction per batch. The behaviour is controlled by three aspects:

the size of the dataset (e.g., node count or relationship count)
the configured batch size
the configured concurrency

4.1. Syntax

The syntax for remote write-back is identical for Attached and Self-managed sessions.

Remote graph write-back:

gds.graph.<operation>.write(
    graph_name: str,
    # additional parameters,
    **config: Any,
): Series[Any]

Remote algorithm write-back:

gds.<algo>.write(
    graph_name: str,
    **config: Any,
): Series[Any]

All write-back endpoints support the following additional configuration:

Table 4. Parameters:
Name	Optional	Default	Description
`concurrency`	yes	dynamic ^[1]	Concurrency to use for writing back to the DBMS.
`arrowConfiguration`	yes	-	Dict containing additional configuration for the connection from the DBMS to the GDS Arrow Server.
1. Twice the number of processors on the DBMS server

Table 5. Arrow configuration:
Name	Optional	Default	Description
`batchSize`	yes	`10000`	Size of batches retrieved by the DBMS from the session.

4.2. Examples

Extending the previous example, we can write back the FastRP embeddings to the Neo4j DB as follows:

Write mutated FastRP embeddings back to the database:

gds.graph.nodeProperties.write(G, "embedding")

If we want to tune the performance of the write-back, we can configure batchSize and concurrency. In this example we show how to do this with an algorithm .write mode:

Compute WCC and write the component ids back as node properties, with custom concurrency configuration:

gds.wcc.write(
  G,
  writeProperty="wcc",
  concurrency=12,
  arrowConfiguration={"batchSize": 25000}
)

5. Querying the database

You can run Cypher queries on the Neo4j DB using the run_cypher() method. There is no restriction on the type of query that can be run, but it is important to note that the query will be run on the Neo4j DB, and not on the GDS Session.

If you want to use Cypher to operate Graph Analytics Serverless use the Cypher API.

Run a Cypher query to find our written-back embeddings:

gds.run_cypher("MATCH (n:User) RETURN n.name, n.embedding")