Known limitations of security

This section explains known limitations and implications of Neo4js role-based access control security.

1. Security and indexes

As described in Indexes for search performance, Neo4j 4.1 supports the creation and use of indexes to improve the performance of Cypher queries. The Neo4j security model will impact the results of queries (regardless if the indexes are used). When using non full-text Neo4j indexes, a Cypher query will always return the same results it would have if no index existed. This means that if the security model causes fewer results to be returned due to restricted read access in Graph and sub-graph access control, the index will also return the same fewer results.

However, this rule is not fully obeyed by Indexes for full-text search. These specific indexes are backed by Lucene internally. It is therefore not possible to know for certain whether a security violation occurred for each specific entry returned from the index. As a result, Neo4j will return zero results from full-text indexes if it is determined that any result might violate the security privileges active for that query.

Since full-text indexes are not automatically used by Cypher, this does not lead to the case where the same Cypher query would return different results simply because such an index got created. Users need to explicitly call procedures to use these indexes. The problem is only that if this behavior is not understood by the user, they might expect the full text index to return the same results that a different, but semantically similar, Cypher query does.

1.1. Example with denylisted properties

Consider the following example. The database has nodes with label :User and these have properties name and email. We have indexes on both properties:

CREATE INDEX FOR (n:User) FOR (n.name, n.surname);
CALL db.index.fulltext.createNodeIndex("userNames",["User", "Person"],["name", "surname"]);
Full-text indexes also support multiple labels. See Indexes for full-text search for more details on creating and using full-text indexes.

After creating these indexes, it would appear we have two indexes accomplishing the same thing. However, this is not completely accurate. These two indexes behave in different ways and are focused on different use cases. A key difference is that full-text indexes are backed by Lucene, and will use the Lucene syntax for querying the index.

This has consequences for users restricted on the labels or properties involved in the indexes. Ideally, if the labels and properties in the index are denylisted, we can correctly return zero results from both native indexes and full-text indexes. However, there are borderline cases where this is not as simple.

Imagine the following nodes were added to the database:

CREATE (:User {name:'Mark', surname:'Andy'});
CREATE (:User {name:'Andy', surname:'Anderson'});
CREATE (:User:Person {name:'Mandy', surname:'Smith'});
CREATE (:User:Person {name:'Joe', surname:'Andy'});

Consider denylisting on the label :Person.

DENY TRAVERSE Person ON GRAPH * TO users;

If the user runs a query that will use the native index:

MATCH (n:User) WHERE n.name CONTAINS 'ndy' RETURN n.name;

This query will perform several checks:

  • do a scan on the index to create a stream of results of nodes with the name property, which leads to four results

  • filter the results to include only nodes where n.name CONTAINS 'ndy', filtering out Mark and Joe so we have two results

  • filter the results to exclude nodes that also have the label :Person, filtering out Mandy so we only have one result

For the above dataset, we can see we will get one result.

What if we query this with the full-text index:

CALL db.index.fulltext.queryNodes("userNames", "ndy") YIELD node, score
RETURN node.name

The problem now is that we do not know if the results provided by the index were because of a match to the name or the surname property. The steps taken by the query engine would be:

  • run a Lucene query on the full-text index to produce results containing ndy in either property, leading to four results.

  • filter the results to exclude nodes that also have the label :Person, filtering out Mandy and Joe so we have two results.

This difference in results is due to the OR relationship between the two properties in the index creation.

1.2. Denylisting properties

Now consider denying access on properties, like the surname property:

DENY READ {surname} ON GRAPH * TO users;

Now we run the same queries again:

MATCH (n:User) WHERE n.name CONTAINS 'ndy' RETURN n.name;

This query will operate exactly as before, returning the same single result, because nothing in this query relates to the denylisted property.

But consider the full-text index query:

CALL db.index.fulltext.queryNodes("userNames", "ndy") YIELD node, score
RETURN node.name

The problem now is that we do not know if the results provided by the index were because of a match to the name or the surname property. Results from the surname need to be excluded by the security rules, because they require that the user cannot see any surname properties. However, the security model is not able to introspect the Lucene query to know what it will actually do, whether it works only on the allowed name property, or also on the disallowed surname property. We know that the earlier query returned a match for Joe Andy which should now be filtered out. So, in order to never return results the user should not be able to see, we have to block all results. The steps taken by the query engine would be:

  • Determine if the full-text index includes denylisted properties

  • If yes, return an empty results stream, otherwise process as before

The query will therefore return zero results in this case, rather than simply returning only the Andy result that might be expected.

2. Security and labels

2.1. Traversing the graph with multi-labeled nodes

The general influence of access control privileges on graph traversal is described in detail in Graph and sub-graph access control. The following section will only focus on nodes because of their ability to have multiple labels. Relationships can only ever have one type and thus they do not exhibit the behavior this section aims to clarify. While this section will not mention relationships further, the general function of the traverse privilege also applies to them.

For any node that is traversable, due to GRANT TRAVERSE or GRANT MATCH, the user can get information about the labels attached to the node by calling the built-in labels() function. In the case of nodes with multiple labels, this can seemingly result in labels being returned to which the user wasn’t directly granted access to.

To give an illustrative example, imagine a graph with three nodes: one labeled :A, one labeled :B and one with :A :B. We also have a user with a role custom as defined by:

GRANT TRAVERSE ON GRAPH * NODES A TO custom;

If that user were to execute

MATCH (n:A) RETURN n, labels(n);

they would be returned two nodes: the node that was labeled with :A and the node with labels :A :B.

In contrast, executing

MATCH (n:B) RETURN n, labels(n);

will return only the one node that has both labels: :A :B. Even though :B was not allowed access for traversal, there is one node with that label accessible in the data because of the allowlisted label :A that is attached to the same node.

If a user is denied traverse on a label they will never get results from any node that has this label attached to it. Thus, the label name will never show up for them. For our example this can be done by executing:

DENY TRAVERSE ON GRAPH * NODES B TO custom;

The query

MATCH (n:A) RETURN n, labels(n);

will now return the node only labeled with :A, while the query

MATCH (n:B) RETURN n, labels(n);

will now return no nodes.

2.2. The db.labels() procedure

In contrast to the normal graph traversal described in the previous section, the built-in db.labels() procedure is not processing the data graph itself but the security rules defined on the system graph. That means:

  • if a label is explicitly whitelisted (granted), it will be returned by this procedure.

  • if a label is denied or isn’t explicitly allowed it will not be returned by this procedure.

To reuse the example of the previous section: imagine a graph with three nodes: one labeled :A, one labeled :B and one with :A :B. We also have a user with a role custom as defined by:

GRANT TRAVERSE ON GRAPH * NODES A TO custom;

This means that only label :A is explicitly allowlisted. Thus, executing

CALL db.labels();

will only return label :A because that is the only label for which traversal was granted.

3. Security and count store operations

The rules of a security model may impact some of the database operations. This comes down to necessary additional security checks that incur additional data accesses. Especially in regards to count store operations, as they are usually very fast lookups, the difference might be noticeable.

Let’s look at the following security rules that set up a restricted and a free role as an example:

GRANT TRAVERSE ON GRAPH * NODES Person TO restricted;
DENY TRAVERSE ON GRAPH * NODES Customer TO restricted;
GRANT TRAVERSE ON GRAPH * ELEMENTS * TO free;

Now, let’s look at what the database needs to do in order to execute the following query:

MATCH (n:Person) RETURN count(n);

For both roles the execution plan will look like this:

+--------------------------+
| Operator                 |
+--------------------------+
| +ProduceResults          |
| |                        +
| +NodeCountFromCountStore |
+--------------------------+

Internally however, very different operations need to be executed. The following table illustrates the difference.

User with free role User with restricted role

The database can access the count store and retrieve the total number of nodes with the label :Person.

This is a very quick operation.

The database cannot just access the count store because it must make sure that only traversable nodes with the desired label :Person are counted. Due to this, each node with the :Person label needs to be accessed and examined to make sure that it does not also have a denylisted label, such as :Customer.

Due to the additional data accesses that the security checks need to do, this operation will be slower compared to executing the query as an unrestricted user.