Estimating memory usage and resizing an instance
Follow along with a notebook in Google Colab |
This example shows how to:
-
use the memory estimation mode to estimate the memory requirements for an algorithm before running it
-
resize an AuraDS instance to accommodate the algorithm memory requirements
Setup
For more information on how to get started using Python, refer to the Connecting with Python tutorial.
pip install graphdatascience
# Import the client
from graphdatascience import GraphDataScience
# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""
# Configure the client with AuraDS-recommended settings
gds = GraphDataScience(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD),
aura_ds=True
)
In the following code examples we use the print
function to print Pandas DataFrame
and Series
objects. You can try different ways to print a Pandas object, for instance via the to_string
and to_json
methods; if you use a JSON representation, in some cases you may need to include a default handler to handle Neo4j DateTime
objects. Check the Python connection section for some examples.
For more information on how to get started using the Cypher Shell, refer to the Neo4j Cypher Shell tutorial.
Run the following commands from the directory where the Cypher shell is installed. |
export AURA_CONNECTION_URI="neo4j+s://xxxxxxxx.databases.neo4j.io"
export AURA_USERNAME="neo4j"
export AURA_PASSWORD=""
./cypher-shell -a $AURA_CONNECTION_URI -u $AURA_USERNAME -p $AURA_PASSWORD
For more information on how to get started using Python, refer to the Connecting with Python tutorial.
pip install neo4j
# Import the driver
from neo4j import GraphDatabase
# Replace with the actual URI, username, and password
AURA_CONNECTION_URI = "neo4j+s://xxxxxxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = ""
# Instantiate the driver
driver = GraphDatabase.driver(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD)
)
# Import to prettify results
import json
# Import for the JSON helper function
from neo4j.time import DateTime
# Helper function for serializing Neo4j DateTime in JSON dumps
def default(o):
if isinstance(o, (DateTime)):
return o.isoformat()
Create an example graph
An easy way to create an in-memory graph is through the GDS graph generation algorithm. By specifing the number of nodes, the average number of relationships going out of each node and the relationship distribution function, the algorithm creates a graph having the following shape:
(:50000000_Nodes)-[:REL]→(:50000000_Nodes)
# Run the graph generation algorithm and retrieve the corresponding
# graph object and call result metadata
g, result = gds.beta.graph.generate(
"example-graph",
50000000,
3,
relationshipDistribution="POWER_LAW"
)
# Print prettified graph stats
print(result)
CALL gds.beta.graph.generate(
'example-graph',
50000000,
3,
{relationshipDistribution: 'POWER_LAW'}
)
YIELD name,
nodes,
relationships,
generateMillis,
relationshipSeed,
averageDegree,
relationshipDistribution,
relationshipProperty
RETURN *
# Cypher query
create_example_graph_query = """
CALL gds.beta.graph.generate(
'example-graph',
50000000,
3,
{relationshipDistribution: 'POWER_LAW'}
)
YIELD name,
nodes,
relationships,
generateMillis,
relationshipSeed,
averageDegree,
relationshipDistribution,
relationshipProperty
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
result = session.run(create_example_graph_query).data()
# Prettify the result
print(json.dumps(result, indent=2, sort_keys=True, default=default))
The graph is fairly large, so the generation procedure will take a few minutes to complete. |
Run the estimate
mode
The estimation of the memory requirements of an algorithm on an in-memory graph can be useful to determine whether the current AuraDS instance has enough resources to run the algorithm to completion.
The Graph Data Science has guard rails built in: if an algorithm is estimated to use more RAM than is available, an exception is raised. In this case, the AuraDS instance can be resized before running the algorithm again.
In the following example we get a memory estimation for the Label Propagation algorithm to run on the generated graph. The estimated memory is between 381 MiB and 4477 MiB, which is higher than an 8 GB instance has available (4004 MiB).
result = gds.labelPropagation.mutate.estimate(
g,
mutateProperty="communityID"
)
print(result)
CALL gds.labelPropagation.mutate.estimate(
'example-graph',
{mutateProperty: 'communityID'}
)
YIELD nodeCount,
relationshipCount,
bytesMin,
bytesMax,
requiredMemory
RETURN *
# Cypher query
page_rank_mutate_estimate_example_graph_query = """
CALL gds.labelPropagation.mutate.estimate(
'example-graph',
{mutateProperty: 'communityID'}
)
YIELD nodeCount,
relationshipCount,
bytesMin,
bytesMax,
requiredMemory
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
results = session.run(page_rank_mutate_estimate_example_graph_query).data()
# Prettify the result
print(json.dumps(results, indent=2, sort_keys=True))
The mutate
procedure hits the guard rails on an 8 GB instance, raising an exception that suggests to resize the AuraDS instance.
result = gds.labelPropagation.mutate(
g,
mutateProperty="communityID"
)
print(result)
CALL gds.labelPropagation.mutate(
'example-graph',
{mutateProperty: 'communityID'}
)
YIELD preProcessingMillis,
computeMillis,
mutateMillis,
postProcessingMillis,
nodePropertiesWritten,
communityCount,
ranIterations,
didConverge,
communityDistribution,
configuration
RETURN *
# Cypher query
page_rank_mutate_example_graph_query = """
CALL gds.labelPropagation.mutate(
'example-graph',
{mutateProperty: 'communityID'}
)
YIELD preProcessingMillis,
computeMillis,
mutateMillis,
postProcessingMillis,
nodePropertiesWritten,
communityCount,
ranIterations,
didConverge,
communityDistribution,
configuration
RETURN *
"""
# Create the driver session
with driver.session() as session:
# Run query
results = session.run(page_rank_mutate_example_graph_query).data()
# Prettify the result
print(json.dumps(results, indent=2, sort_keys=True))
Resize the AuraDS instance
You will need to resize the instance to the next available size (16 GB) in order to continue. An AuraDS instance can be resized from the Neo4j Aura Console homepage. For more information, check the Instance actions section.
Resizing an AuraDS instance incurs a short amount of downtime. |
After resizing, wait a few seconds until the projected graph is reloaded, then run the mutate
step again.
This time no exception is thrown and the step completes successfully.
Cleanup
The in-memory graph can now be deleted.
result = gds.graph.drop(g)
print(result)
CALL gds.graph.drop('example-graph')
delete_example_in_memory_graph_query = """
CALL gds.graph.drop('example-graph')
"""
with driver.session() as session:
# Run query
results = session.run(delete_example_in_memory_graph_query).data()
# Prettify the results
print(json.dumps(results, indent=2, sort_keys=True, default=default))
Closing the connection
The connection should always be closed when no longer needed.
Although the GDS client automatically closes the connection when the object is deleted, it is good practice to close it explicitly.
# Close the client connection
gds.close()
# Close the driver connection
driver.close()
References
Cypher
-
Learn more about the Cypher syntax
-
You can use the Cypher Cheat Sheet as a reference of all available Cypher features