Migrate to Aura Graph Analytics
This guide provides an overview of the prerequisites and preparation steps needed to migrate existing graph data science workloads to Aura Graph Analytics (AGA) from AuraDS or the AuraDB Pro Graph Analytics plugin.
Overview
The main difference when migrating to AGA is the shift from a persistent compute model to an on-demand (ephemeral) session model. This change results in the architectural differences and operational considerations summarized below.
| AuraDS / Graph Analytics plugin | Aura Graph Analytics | |
|---|---|---|
Compute resource |
Always-on instance (AuraDS) |
Sessions created on-demand for the duration of the task; no shared resources |
Data location |
Data resides on the same machine as the compute |
Data is loaded from your source (Neo4j DB or external) into the session’s memory |
Cost model |
Flat hourly rate (AuraDS) |
Pay-as-you-go (billed only while the session is active) |
| AuraDS / Graph Analytics plugin | Aura Graph Analytics | |
|---|---|---|
Authentication |
Database credentials |
Implicit (from the Aura Console) or Aura API credentials (for external usage and scripts) |
Maximum running time |
Indefinite (until instance is deleted) |
Hard limit of 7 days (168 hours), which can interrupt long-running workflows |
Provisioning |
Instance already up and running |
The provisioning process is not instantaneous and can take from a few seconds to a few minutes |
File system access |
File export procedures can write to the local file system |
No access to the local file system; results must be streamed back[1] to the session client first |
1. You can use gds.graph.nodeProperties.stream and gds.graph.relationshipProperties.stream to retrieve nodes and relationships from a remote projected graph.
| ||
Code changes
Authentication
Aura Graph Analytics requires authentication with the Aura API to launch sessions. The Aura API credentials are different from the Neo4j database credentials.
-
If you launch your session through the Query tab in the Aura console, you are implicitly authorized to create analytics sessions and do not need to generate or provide Aura API keys. You only need the Neo4j database credentials to read your data.
-
If you launch your session with the GDS Python client or via an external script, you must explicitly authenticate with the Aura API first.
-
Include the credentials in your script before the first call to the session creation method.
|
Save the credentials immediately, as they cannot be viewed again after creation. |
Python client
If you use the graphdatascience Python library, you must update how you instantiate the GDS object and how you project graphs.
from graphdatascience import GraphDataScience
# Direct connection to the database/instance
gds = GraphDataScience(
"neo4j+s://my-aurads-instance.databases.neo4j.io",
auth=("neo4j", "password")
)
Use the GdsSessions class to authenticate with the Aura API and spawn a session.
from graphdatascience.session import GdsSessions, AuraAPICredentials, SessionMemory, DbmsConnectionInfo
# Authenticate with Aura API
sessions = GdsSessions(
api_credentials=AuraAPICredentials(
client_id="<YOUR_CLIENT_ID>",
client_secret="<YOUR_CLIENT_SECRET>",
project_id="<YOUR_PROJECT_ID>" # Optional if you have only one project
)
)
# Define the source database (if loading from AuraDB)
db_connection = DbmsConnectionInfo(
aura_instance_id="<YOUR_INSTANCE_ID>",
username="neo4j",
password="<YOUR_PASSWORD>"
)
# Get or create the session
# This returns a 'gds' object configured to run on the remote session
gds = sessions.get_or_create(
session_name="my-analytics-job",
memory=SessionMemory.m_16GB,
db_connection=db_connection
)
# Use the 'gds' object to run any commands directly on the session
# Optional: use 'gds.verify_connectivity()' to verify that the connection is correctly configured
G = gds.graph.project("my-graph", "Person", "KNOWS")
With Aura Graph Analytics you cannot use a standard native projection. You must use a remote projection to pull data over the network.
From AuraDB or self-managed Neo4j (Remote projection)
If the session reads data from a Neo4j source, you must use gds.graph.project with a query that specifically calls gds.graph.project.remote.
This procedure moves data from the source DB to the session.
# The query runs on your DB, but projects into the session
query = """
MATCH (n:Person)-[:KNOWS]->(m:Person)
RETURN gds.graph.project.remote(
n,
m
)
"""
G, result = gds.graph.project(
graph_name="my-graph",
query=query
)
From a Pandas DataFrame (Standalone)
If your data source is not Neo4j (via Pandas DataFrames), there is no connected database to run Cypher against.
In this case, you use gds.graph.construct to load data directly from memory structures.
# Create DataFrames for nodes and relationships
nodes = pandas.DataFrame({
"nodeId": [1, 2, 3],
"labels": ["Person", "Person", "Person"],
"age": [25, 30, 35]
})
rels = pandas.DataFrame({
"sourceNodeId": [1, 2],
"targetNodeId": [2, 3],
"relationshipType": ["KNOWS", "KNOWS"]
})
# Construct the graph in the session
G = gds.graph.construct(
"my-dataframe-graph",
nodes,
rels
)
Cleanup
No cleanup required.
You must delete the session when you no longer need it to stop billing. If you do not delete it explicitly, the session is deleted automatically anyway when it reaches its TTL.
You can either delete the session explicitly by referencing the session name or call delete() on the gds object.
# Delete the session
sessions.delete(session_name="my-analytics-job")
# Delete the 'gds' object
gds.delete()
Cypher
If you run GDS algorithms directly via Cypher in Aura Console, you only need to update your projection query and include the configuration parameters that trigger the on-demand session.
CALL gds.graph.project('my-graph', 'Person', 'KNOWS');
CALL gds.pageRank.stream('my-graph') YIELD nodeId, score;
By default, when you are logged into an AuraDB instance, you are automatically authorized to create and manage your own analytics sessions. You do not need to handle API credentials manually, and only need to add a configuration map with a memory key to your projection call. This triggers the remote session creation.
// Project and create session
// The 'memory' configuration setting automatically triggers the
// creation of the remote session
MATCH (n:Person)-[:KNOWS]->(m:Person)
RETURN gds.graph.project(
'my-graph',
n,
m,
{}, // (Optional) Node/Relationship properties config
{ memory: '8GB', ttl: duration({hours: 1}) } // Session configuration
);
// Run algorithms in the same way
CALL gds.pageRank.stream('my-graph')
YIELD nodeId, score;
// Drop the graph to delete the session
CALL gds.graph.drop('my-graph');
Best practices
Organization limits for AGA
If you try to spawn more sessions or create a session larger than allowed by your organization settings, the request will be rejected.
If your workload is likely to exceed any of these limits, you may need to request a quota increase.
Data transfer costs
Be aware of any data egress fees from your cloud provider when loading data across regions or from an external cloud provider. Where possible, co-locate your session in the same region as your data source.
Memory size estimation
Since the memory is configured for each session, you can potentially hit Out-Of-Memory (OOM) errors if the memory requirements change across sessions.
To prevent this, the Python client has an estimate method to calculate the memory requirement before creating a session.
The Cypher API does not include an estimation procedure, so you may need to adjust the memory parameter by trial and error.
Session Time-To-Live
Always set a Time-To-Live (TTL) limit for your sessions to prevent accidental costs, for example if a script crashes before it can tear down the session. If you do not set one, the default TTL value is 1 hour.
Long-running workflows
A session terminates automatically when its duration hits the hard stop limit (168 hours), regardless of whether a job is currently running.
If your workflow duration is likely to exceed the limit, you must either reduce its running time or save intermediate results externally.
Session startup latency
Since the creation of a new session involves provisioning a new compute infrastructure and is not instantaneous, do not create sessions inside synchronous or latency-sensitive user requests such as API calls. Either create a session beforehand or use an asynchronous job queue instead.