GDS Sessions
This Jupyter notebook is hosted here in the Neo4j Graph Data Science Client Github repository.
The notebook shows how to use the graphdatascience
Python library to
create, manage, and use a GDS Session.
We consider a graph of people and fruits, which we’re using as a simple example to show how to connect your AuraDB instance to a GDS Session, run algorithms, and eventually write back your analytical results to the AuraDB database. We will cover all management operations: creation, listing, and deletion.
1. Prerequisites
This notebook requires having an AuraDB instance available and have the GDS sessions feature enabled for your tenant. Contact your account manager to get the features enabled.
We also need to have the graphdatascience
Python library installed,
version 1.11
or later.
%pip install "graphdatascience"
2. Aura API credentials
A GDS Session is created and accessed via the Aura API. In order to use the Aura API, we need to have Aura API credentials. For how to create these credentials, see the Aura documentation.
Using these credentials, we can create our GdsSessions
object, which
is the main entry point for managing GDS Sessions.
import os
from graphdatascience.session import GdsSessions, AuraAPICredentials
client_id = os.environ["AURA_API_CLIENT_ID"]
client_secret = os.environ["AURA_API_CLIENT_SECRET"]
# If your account is a member of several tenants, you must also specify the tenant ID to use
tenant_id = os.environ.get("AURA_API_TENANT_ID", None)
sessions = GdsSessions(api_credentials=AuraAPICredentials(client_id, client_secret, tenant_id=tenant_id))
3. Creating a new session
A new session is created by calling sessions.get_or_create()
. When we
want to do this, we must identify a data source. We assume here that an
AuraDB instance has been created and is available to access. We need to
provide the database Bolt address, username, and password to the
DbmsConnectionInfo
class.
We also have to specify the size of the session. There are many possible sizes to choose from. Please refer to the API reference docs or the manual for a full list.
Lastly, we need to give our session a name. We will call ours
analysing-people-and-fruits
. If we had already created a session and
want to reconnect to it, the same code is used. Doing that will not
incur any additional costs, and will be a lot faster.
import os
from graphdatascience.session import DbmsConnectionInfo, AlgorithmCategory
# Identify the AuraDB instance
aura_db_address = os.environ["AURA_DB_ADDRESS"]
aura_db_user = os.environ["AURA_DB_USER"]
aura_db_pw = os.environ["AURA_DB_PW"]
# Create a GDS session!
memory = sessions.estimate(
node_count=20,
relationship_count=50,
algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)
gds = sessions.get_or_create(
# we give it a representative name
session_name="people-and-fruits",
memory=memory,
db_connection=DbmsConnectionInfo(aura_db_address, aura_db_user, aura_db_pw),
)
4. Listing sessions
Now that we have created a session, let’s list all our sessions to see what that looks like
sessions.list()
5. Adding a dataset
We assume that the configured AuraDB instance is empty. We will add our dataset using standard Cypher.
In a more realistic scenario, this step is already done, and we would just connect to the existing database.
data_query = """
CREATE
(dan:Person {name: 'Dan', age: 18, experience: 63, hipster: 0}),
(annie:Person {name: 'Annie', age: 12, experience: 5, hipster: 0}),
(matt:Person {name: 'Matt', age: 22, experience: 42, hipster: 0}),
(jeff:Person {name: 'Jeff', age: 51, experience: 12, hipster: 0}),
(brie:Person {name: 'Brie', age: 31, experience: 6, hipster: 0}),
(elsa:Person {name: 'Elsa', age: 65, experience: 23, hipster: 1}),
(john:Person {name: 'John', age: 4, experience: 100, hipster: 0}),
(apple:Fruit {name: 'Apple', tropical: 0, sourness: 0.3, sweetness: 0.6}),
(banana:Fruit {name: 'Banana', tropical: 1, sourness: 0.1, sweetness: 0.9}),
(mango:Fruit {name: 'Mango', tropical: 1, sourness: 0.3, sweetness: 1.0}),
(plum:Fruit {name: 'Plum', tropical: 0, sourness: 0.5, sweetness: 0.8})
CREATE
(dan)-[:LIKES]->(apple),
(annie)-[:LIKES]->(banana),
(matt)-[:LIKES]->(mango),
(jeff)-[:LIKES]->(mango),
(brie)-[:LIKES]->(banana),
(elsa)-[:LIKES]->(plum),
(john)-[:LIKES]->(plum),
(dan)-[:KNOWS]->(annie),
(dan)-[:KNOWS]->(matt),
(annie)-[:KNOWS]->(matt),
(annie)-[:KNOWS]->(jeff),
(annie)-[:KNOWS]->(brie),
(matt)-[:KNOWS]->(brie),
(brie)-[:KNOWS]->(elsa),
(brie)-[:KNOWS]->(jeff),
(john)-[:KNOWS]->(jeff);
"""
# making sure the database is actually empty
assert gds.run_cypher("MATCH (n) RETURN count(n)").squeeze() == 0, "Database is not empty!"
# let's now write our graph!
gds.run_cypher(data_query)
gds.run_cypher("MATCH (n) RETURN count(n) AS nodeCount")
6. Projecting Graphs
Now that we have imported a graph to our database, we can project it
into our GDS Session. We do that by using the gds.graph.project()
endpoint.
The remote projection query that we are using selects all Person
nodes
and their LIKES
relationships, and all Fruit
nodes and their LIKES
relationships. Additionally, we project node properties for illustrative
purposes. We can use these node properties as input to algorithms,
although we do not do that in this notebook.
G, result = gds.graph.project(
"people-and-fruits",
"""
CALL {
MATCH (p1:Person)
OPTIONAL MATCH (p1)-[r:KNOWS]->(p2:Person)
RETURN
p1 AS source, r AS rel, p2 AS target,
p1 {.age, .experience, .hipster } AS sourceNodeProperties,
p2 {.age, .experience, .hipster } AS targetNodeProperties
UNION
MATCH (f:Fruit)
OPTIONAL MATCH (f)<-[r:LIKES]-(p:Person)
RETURN
p AS source, r AS rel, f AS target,
p {.age, .experience, .hipster } AS sourceNodeProperties,
f { .tropical, .sourness, .sweetness } AS targetNodeProperties
}
RETURN gds.graph.project.remote(source, target, {
sourceNodeProperties: sourceNodeProperties,
targetNodeProperties: targetNodeProperties,
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
relationshipType: type(rel)
})
""",
)
str(G)
7. Running Algorithms
We can now run algorithms on the projected graph. This is done using the standard GDS Python Client API. There are many other tutorials covering some interesting things we can do at this step, so we will keep it rather brief here.
We will simply run PageRank and FastRP on the graph.
print("Running PageRank ...")
pr_result = gds.pageRank.mutate(G, mutateProperty="pagerank")
print(f"Compute millis: {pr_result['computeMillis']}")
print(f"Node properties written: {pr_result['nodePropertiesWritten']}")
print(f"Centrality distribution: {pr_result['centralityDistribution']}")
print("Running FastRP ...")
frp_result = gds.fastRP.mutate(
G,
mutateProperty="fastRP",
embeddingDimension=8,
featureProperties=["pagerank"],
propertyRatio=0.2,
nodeSelfInfluence=0.2,
)
print(f"Compute millis: {frp_result['computeMillis']}")
# stream back the results
gds.graph.nodeProperties.stream(G, ["pagerank", "fastRP"], separate_property_columns=True, db_node_properties=["name"])
8. Writing back to AuraDB
The GDS Session’s in-memory graph was projected from data in our specified AuraDB instance. Write back operations will thus persist the data back to the same AuraDB. Let’s write back the results of the PageRank and FastRP algorithms to the AuraDB instance.
# if this fails once with some error like "unable to retrieve routing table"
# then run it again. this is a transient error with a stale server cache.
gds.graph.nodeProperties.write(G, ["pagerank", "fastRP"])
Of course, we can just use .write
modes as well. Let’s run Louvain in
write mode to show:
gds.louvain.write(G, writeProperty="louvain")
We can now use the gds.run_cypher()
method to query the updated graph.
Note that the run_cypher()
method will run the query on the AuraDB
instance.
gds.run_cypher(
"""
MATCH (p:Person)
RETURN p.name, p.pagerank AS rank, p.louvain
ORDER BY rank DESC
"""
)
9. Deleting the session
Now that we have finished our analysis, we can delete the session. The results that we produced were written back to our AuraDB instance, and will not be lost. If we computed additional things that we did not write back, those will be lost.
Deleting the session will release all resources associated with it, and stop incurring costs.
gds.delete()
# or sessions.delete("people-and-fruits")
# let's also make sure the deleted session is truly gone:
sessions.list()
# Lastly, let's clean up the database
gds.run_cypher("MATCH (n:Person|Fruit) DETACH DELETE n")