data:image/s3,"s3://crabby-images/441ab/441ab128244ec60a2319eb65907d04160becc23a" alt=""
Written by Nicole White, originally posted on her blog.
What’s New in RNeo4j?
RNeo4j is Neo4j’s R driver – it allows you to quickly and easily interact with a Neo4j database from your R environment. Some recent updates to RNeo4j include:- My contributions
- Functionality for retrieiving and handling paths
- Additional sample datasets
- Community contributions
- Open the Neo4j browser in RStudio
- Set custom HTTP options
Paths
Several functions have been added for retrieving and manipulating paths. These include:getPaths
nodes
rels
shortestPath
allShortestPaths
library(RNeo4j)
neo4j = startGraph("https://localhost:7474/db/data/")
alice = createNode(neo4j, "User", name = "Alice")
bob = createNode(neo4j, "User", name = "Bob")
charles = createNode(neo4j, "User", name = "Charles")
david = createNode(neo4j, "User", name = "David")
elaine = createNode(neo4j, "User", name = "Elaine")
r1 = createRel(alice, "LIKES", bob, weight = 1)
r2 = createRel(bob, "LIKES", charles, weight = 2)
r3 = createRel(bob, "LIKES", david, weight = 3)
r4 = createRel(charles, "LIKES", david, weight = 4)
r5 = createRel(alice, "LIKES", elaine, weight = 5)
r6 = createRel(elaine, "LIKES", david, weight = 6)
getPaths
getPaths allows you to retrieve a list of path objects with a Cypher query. The following query will find all paths (to a maximum depth of four) that traverse the relationship typeLIKES
between Alice and David:
query = "
MATCH p = (u1:User)-[:LIKES*..4]->(u2:User)
WHERE u1.name = 'Alice' AND u2.name = 'David'
RETURN p
"
p = getPaths(neo4j, query)
length(p)
## [1] 3
nodes
nodes extracts the node objects from a path object. Becausepaths
is a list of path objects, we need to lapply
through it and apply the function nodes
to each path in the list:
n = lapply(p, nodes)
n
is a list of lists. If we wanted to view the names of all the people on each path, for example, we can again use the apply family of functions. As we saw earlier when using length
, three paths were found. The following displays the names of the nodes on each of the paths:
lapply(n, function(x) sapply(x, `[[`, 'name'))
## [[1]]
## [1] "Alice" "Bob" "Charles" "David"
##
## [[2]]
## [1] "Alice" "Bob" "David"
##
## [[3]]
## [1] "Alice" "Elaine" "David"
rels
rels, similarly tonodes
, extracts the relationship objects from a path object. Recall that each relationship has a weight
property.
r = lapply(p, rels)
lapply(r, function(x) sapply(x, `[[`, 'weight'))
## [[1]]
## [1] 1 2 4
##
## [[2]]
## [1] 1 3
##
## [[3]]
## [1] 5 6
shortestPath, allShortestPaths
shortestPath and allShortestPaths find a single shortest path or all shortest paths between two node objects, respectively.p = shortestPath(alice, "LIKES", david, max_depth = 4)
sapply(nodes(p), `[[`, 'name')
## [1] "Alice" "Bob" "David"
p = allShortestPaths(alice, "LIKES", david, max_depth = 4)
n = lapply(p, nodes)
lapply(n, function(x) sapply(x, `[[`, 'name'))
## [[1]]
## [1] "Alice" "Bob" "David"
##
## [[2]]
## [1] "Alice" "Elaine" "David"
allShortestPaths
found both of the these paths because they tie for the shortest path (they’re both length-two paths). shortestPath
, on the other hand, returned just one of these paths arbitrarily.
Additional Sample Datasets
If you want to quickly begin exploring the capabilities of RNeo4j but don’t have any datasets, you can import one of the sample datasets shipped with this package withimportSample. This will import the selected dataset into Neo4j. There are four datasets, ranging from travel to entertainment to social. These include:- Dallas/Forth Worth Airport
- Caltrain
- Movies
"tweets"
:
importSample(neo4j, "tweets")
importSample
, you will have to answer a prompt confirming it is okay to wipe your Neo4j database and import the selected dataset. You can get a decent overview of what’s been imported into Neo4j with summary
, which will show you what is related and how:
summary(neo4j)
## This To That
## 1 Tweet USING Source
## 2 Tweet MENTIONS User
## 3 Tweet RETWEETS Tweet
## 4 User POSTS Tweet
## 5 Tweet CONTAINS Link
## 6 Tweet REPLY_TO Tweet
## 7 Hashtag TAGS Tweet
query = "
MATCH (tweet:Tweet)-[:MENTIONS]->(user:User)
RETURN user.screen_name AS user, COUNT(tweet) AS mentions
ORDER BY mentions DESC
LIMIT 5
"
cypher(neo4j, query)
## user mentions
## 1 neo4j 28
## 2 ikwattro 8
## 3 _nicolemargaret 7
## 4 rvanbruggen 7
## 5 Linkurious 6
Open the Neo4j Browser in RStudio
A pull request by Kenneth Darrell makes it so you can open the Neo4j browser inRStudio’s viewer pane with browse. I use this functionality all the time now, as I usually want to do a quick check to see if my data was imported correctly. Recall that the Twitter dataset is currently in Neo4j, as we imported it earlier. Often, I’ll open the Neo4j browser and run the query…MATCH n RETURN n LIMIT 50
browse(neo4j)
data:image/s3,"s3://crabby-images/7ee3b/7ee3b7c05f40b761f58d472f662a2b8fec3ce4de" alt=""
Set Custom HTTP Options
A pull request by Mark Needham makes it so you can set custom HTTP options. In particular, this is useful for setting the HTTP timeout. These options are set instartGraph:neo4j = startGraph("https://localhost:7474/db/data/",
opts = list(timeout=2))
query = "
MATCH p = ()-[*..5]-() RETURN LENGTH(p)
"
test = try(cypher(neo4j, query))
cat(test[1])
## Error in function (type, msg, asError = TRUE) :
## Operation timed out after 2000 milliseconds with 327680 bytes received