One of the most crucial pieces of Neo4j.rb is
Neo4j::ActiveNode::Query::QueryProxy
, but it may be the area of the gem that is the least spelled out in our documentation. Sure, many of its methods are described and it’s tested extensively, but it and its relationship to
Neo4j::Core::Query
is often a source of confusion. I think there are a few reasons for this — a sense of shared responsibilities, similar or often identical methods, and maybe too much inference in the docs — but whatever the cause(s), I’d like to offer a bit of a primer and some tips.
It gets tricky to explain QueryProxy without getting too technical and I don’t want to do a crappy job of explaining what specs and source accurately describe, so I’m going to try to keep this relatively high-level. For a more technical overview, check out the specs for the classes in their respective Github repos,
neo4jrb/neo4jand
neo4jrb/neo4j-core.
Neo4j::Core::QueryCore::Query is the Cypher DSL in the
Neo4j-core gem. It was built by Brian as part of our 3.0 release and powers almost everything Cypher-related. As a Cypher DSL, it has methods that match almost every function in Cypher:
match(s: Student)
generates
MATCH (s:Student)
,
where(s: { age: 30 })
generates
WHERE s.age = 30
, and so on. It is chainable, so
match(s:Student).where(s: { age: 30 })
generates
MATCH (s:Student) WHERE s.age = 30
.
Core::Query is powerful, complex, and adaptable. Almost every method will accept a string, a symbol, or a hash. It understands the order that functions must be given and will often rearrange the query it generates to ensure proper syntax, but you can use the
break
method to keep things in the order called. If you only had Neo4j-core, you could still do everything that you do in Neo4j.rb, it would just take a bit longer because you wouldn’t have ActiveNode… but you could still work.
Really, you should check out the
wiki, specs and
YARD docs, because they spell everything out.
Neo4j::ActiveNode::Query::QueryProxy
QueryProxy, also built by Brian for the 3.0 release, is a class that implements methods to build Core::Query objects and chains. It powers everything in Neo4j.rb from node and relationship creation to methods like
first
and
where
to matches using associations. When you call
student.lessons << math
, you are using QueryProxy. When you call
student.lessons
, you are returning a QueryProxy. Same goes for
Student.all
,
Student.find
, and so on.
At this point, you might be wondering what the point of QueryProxy is, if it’s just shortcuts to Cypher. Why is it in Neo4j.rb instead of Neo4j-core? The reason is because QueryProxy has a fundamentally different focus: Core::Query is a catch-all, “write-Cypher-with-Ruby” class, but QueryProxy has a model-centric focus and does not aim to be everything for everyone. It understands that it does not need to handle every Cypher query, since that’s why Core::Query exists; instead, it provides convenience methods that leverage ActiveNode models and objects to let you work faster.
We could (and maybe we should?) fill a book about using QueryProxy by itself. Some examples, though…
Student.all.to_a
Neo4j::Core::Query.new.match(s: Student).pluck(:s)
Student.as(:s).all.to_a
Student.lessons.to_a
Neo4j::Core::Query.new.match(s: Student, l: Lesson).match("(s)-[r:ENROLLED_IN]->(l)").pluck(:l)
Student.as(:s).lessons.where(level: 102).pluck(:s)
Neo4j::Core::Query.new.match(s: Student, l: Lesson)
.match("(s)-[r:ENROLLED_IN]->(l { level: {level} })")
.params(level: 102)
.pluck(:s)
Student.where(age: 14).lessons(:l)
.where(level: 102).rel_where(grade: 99)
.teachers.where(name: 'Mr Green')
.pluck(:l)
Neo4j::Core::Query.new.match(s: Student, l: Lesson, t: Teacher)
.where(s: { age: 14 }, l: { level: 102 }, t: { name: 'Mr Green'})
.match("(s)-[r:ENROLLED_IN { grade: 99 }]->(l)<-[r2:TEACHES]-(t)")
.pluck(:l)
As you can see, the QueryProxy version is always shorter because it uses information from models to build queries. Behind the scenes, each of those QP chains will eventually be turned into Neo4j::Core::Query objects, but they’ll use your models to figure out labels and relationship types.
But… things aren’t always that simple.
Between Worlds
I’ve found that the key to getting the most out of Neo4j.rb is understanding how to move between the two Cypher query classes in different situations. In general, QueryProxy provides a lot of magic and is aware of ActiveNode models, associations, and scopes, but it can be a bit limiting when trying to perform advanced queries; on the other hand, Core::Query is extremely flexible, but its syntax is more explicit and it often defeats the purpose of using models, so it can slow you down.
Thankfully, the gem makes it easy to jump from one class to the other. In the 3.0 release, it was possible to go from QueryProxy to Query and in v4, we added methods that let you create a QueryProxy object from your Core::Query. Now that we have these tools, we can move between worlds as needed! The trick is knowing how to do it and when.
When moving from QueryProxy to Query, the methods are
query
and
query_as
. The difference between the two is slight:
query
will make the change without modifying the query;
query_as
will set the identifier of the last link in the chain to the symbol given as an argument.
student.where(age: 14).lessons(:l).query.with(:l).match("(l)-[taught_by:TAUGHT_BY]-(t:Teacher)").pluck(:t)
student.where(age: 14).lessons.query_as(:l).with(:l).match("(l)-[taught_by:TAUGHT_BY]-(t:Teacher)").pluck(:t)
When coming back to QueryProxy, we can use
proxy_as
. It will call
break
on the existing Core::Query to lock its methods in their current order and then create a new MATCH using QP. Its syntax is pretty simple:
proxy_as(ModelContext, :identifier, is_optional)
, with that third parameter being optional and defaulting to
false
. You could do something like this:
student.query_as(:s)
.match("(s)-[rel:FRIENDS_WITH*1..3]->(s2:Student")
.proxy_as(Student, :s2).lessons
By calling
proxy_as
, we end up back in QueryProxy and can call association methods, scopes, and class methods that are defined in the Student class. We reuse our
s2
identifier because we want to match off of that. Had we passed
true
as a third argument, it would use
OPTIONAL MATCH
. Alternatively, we could do
proxy_as_optional
and skip that third argument entirely. It provides added readability since a random
true
can sometimes be unclear.
Get Creative!
There are lots of reasons to move between the two classes. In addition to the example above, you might find yourself with a long Cypher query that might benefit from the use of
with
to improve performance. If we go back to one of our earlier examples, we already saw that we could rewrite this:
Student.where(age: 14).lessons(:l).where(level: 102).rel_where(grade: 99).teachers.where(name: 'Mr Green').pluck(:l)
…as this:
Neo4j::Core::Query.new.match(s: Student, l: Lesson, t: Teacher).where(s: { age: 14 }, l: { level: 102 }, t: { name: 'Mr Green'}).match("(s)-[r:ENROLLED_IN { grade: 99 }]->(l)<-[r2:TEACHES]-(t)").pluck(:l)
But is that the best we can do? No, it’s not. According to my Cypher profiler, in my tiny database, that requires 19 total db hits. If we tell Cypher to load the nodes first, then match within them, we only need 14 db hits, so we’ll have a more efficient query. (In practice, cutting 5 db hits makes no real difference in performance, but this number will go up with a larger database.) In Cypher, it could look like this:
MATCH (s:`Student` { age: 14 }), (l:`Lesson` { level: 102 }), (t:`Teacher` { name: 'Mr Green' })
WITH s, l, t
MATCH (s)-[r:ENROLLED_IN { grade: 99 }]->(l)<-[r2:TEACHES]-(t)
RETURN l
Using our new knowledge of Core::Query, QueryProxy, and the methods that can be used to move between them, we could write it like this:
Student.query_as(:s).match(l: Lesson, t: Teacher)
.where(s: { age: 14 }, l: { level: 102 }, t: { name: 'Mr Green' })
.with(:s, :l, :t)
.proxy_as(Student, :s).lessons(:l).teachers(:t)
.pluck(:l)
Of course, the question to ask here is whether the added complexity of the query is worth the performance gained. I can’t answer that for you, but I can tell you that knowing how and when to optimize can make the difference between an app that performs well and doesn’t. We always strive to make Query and QueryProxy generate the best possible Cypher, but it’s good to know that you can step in and do things yourself when you have the need.