apoc.meta.data
Procedure APOC Core
apoc.meta.data({config}) - examines a subset of the graph to provide a tabular meta information
Signature
apoc.meta.data(config = {} :: MAP?) :: (label :: STRING?, property :: STRING?, count :: INTEGER?, unique :: BOOLEAN?, index :: BOOLEAN?, existence :: BOOLEAN?, type :: STRING?, array :: BOOLEAN?, sample :: LIST? OF ANY?, left :: INTEGER?, right :: INTEGER?, other :: LIST? OF STRING?, otherLabels :: LIST? OF STRING?, elementType :: STRING?)
Config parameters
The procedure support the following config parameters:
name | type | default | description |
---|---|---|---|
sample |
Long |
1000 |
number of nodes to sample per label. See "Sampling" section below. |
Sampling
Because the count stores return an incomplete picture of the data, we have to cross check the results with the actual data to filter out false positives.
We use a subset of the data to analyze by specifying the sample
parameter (1000 by default).
Through this parameter, for each label we split data for each node-label into batches of (total / sample) ± rand
where total
is the total number of nodes with that label and rand
is a number between 0
and total / sample / 10
.
So, we pick a percentage of nodes with that label of roughly sample / total * 100
% to check against.
We pick the first node of each batch, and we analyze the properties and the relationships.
Output parameters
Name | Type |
---|---|
label |
STRING? |
property |
STRING? |
count |
INTEGER? |
unique |
BOOLEAN? |
index |
BOOLEAN? |
existence |
BOOLEAN? |
type |
STRING? |
array |
BOOLEAN? |
sample |
LIST? OF ANY? |
left |
INTEGER? |
right |
INTEGER? |
other |
LIST? OF STRING? |
otherLabels |
LIST? OF STRING? |
elementType |
STRING? |
Usage Examples
The examples in this section are based on the following sample graph:
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (TomH:Person {name:'Tom Hanks', born:1956})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (TheMatrixReloaded:Movie {title:'The Matrix Reloaded', released:2003, tagline:'Free your mind'})
CREATE (TheMatrixRevolutions:Movie {title:'The Matrix Revolutions', released:2003, tagline:'Everything that has a beginning has an end'})
CREATE (SomethingsGottaGive:Movie {title:"Something's Gotta Give", released:2003})
CREATE (TheDevilsAdvocate:Movie {title:"The Devil's Advocate", released:1997, tagline:'Evil has its winning ways'})
CREATE (Something:Something:Else {foo: 'bar'})
CREATE (YouveGotMail:Movie {title:"You've Got Mail", released:1998, tagline:'At odds in life... in love on-line.'})
CREATE (SleeplessInSeattle:Movie {title:'Sleepless in Seattle', released:1993, tagline:'What if someone you never met, someone you never saw, someone you never knew was the only someone for you?'})
CREATE (ThatThingYouDo:Movie {title:'That Thing You Do', released:1996, tagline:'In every life there comes a time when that thing you dream becomes that thing you do'})
CREATE (CloudAtlas:Movie {title:'Cloud Atlas', released:2012, tagline:'Everything is connected'})
CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix)
CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrixReloaded)
CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrixRevolutions)
CREATE (Keanu)-[:ACTED_IN {roles:['Julian Mercer']}]->(SomethingsGottaGive)
CREATE (Keanu)-[:ACTED_IN {roles:['Kevin Lomax']}]->(TheDevilsAdvocate)
CREATE (TomH)-[:ACTED_IN {roles:['Joe Fox']}]->(YouveGotMail)
CREATE (TomH)-[:ACTED_IN {roles:['Sam Baldwin']}]->(SleeplessInSeattle)
CREATE (TomH)-[:ACTED_IN {roles:['Mr. White']}]->(ThatThingYouDo)
CREATE (TomH)-[:ACTED_IN {roles:['Zachry', 'Dr. Henry Goose', 'Isaac Sachs', 'Dermot Hoggins']}]->(CloudAtlas)
CREATE (Keanu)-[:LIKES {rate:10}]->(Carrie)
CREATE (Keanu)-[:LIKES {rate:6}]->(TomH)
CREATE (Keanu)-[:LIKES {rate:4}]->(Laurence)
CREATE (Keanu)-[:LIKES {rate:8}]->(Hugo)
CREATE (Keanu)-[:LIKES {rate:9}]->(LillyW)
CREATE (Keanu)-[:LIKES {rate:6}]->(LanaW)
CREATE (Keanu)-[:LIKES {rate:100}]->(TheMatrix)
CREATE (Carrie)-[:LIKES {rate:7}]->(Keanu)
CREATE (Something)-[:RELATED_TO {rate:99}]->(TheMatrix)
CREATE (Keanu)-[:RELATED_TO {rate:12}]->(TheMatrix)
CREATE (Keanu)-[:RELATED_TO {rate:12}]->(TheMatrixReloaded)
CREATE (Keanu)-[:RELATED_TO {rate:23}]->(TheMatrixRevolutions)
CREATE (Carrie)-[:RELATED_TO {rate:34}]->(TheMatrix)
CREATE (TheMatrix)-[:RELATED_TO {rate:34}]->(Laurence)
CREATE (TheMatrixReloaded)-[:RELATED_TO {rate:345}]->(LanaW)
CALL apoc.meta.data();
label | property | count | unique | index | existence | type | array | sample | left | right | other | otherLabels | elementType |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
"ACTED_IN" |
"Person" |
2 |
false |
false |
false |
"RELATIONSHIP" |
true |
null |
4 |
0 |
["Movie"] |
[] |
"relationship" |
"ACTED_IN" |
"roles" |
0 |
false |
false |
false |
"LIST" |
true |
null |
0 |
0 |
[] |
[] |
"relationship" |
"LIKES" |
"Person" |
2 |
false |
false |
false |
"RELATIONSHIP" |
true |
null |
4 |
1 |
["Movie", "Person"] |
[] |
"relationship" |
"LIKES" |
"rate" |
0 |
false |
false |
false |
"INTEGER" |
false |
null |
0 |
0 |
[] |
[] |
"relationship" |
"RELATED_TO" |
"Person" |
2 |
false |
false |
false |
"RELATIONSHIP" |
true |
null |
2 |
0 |
["Movie"] |
[] |
"relationship" |
"RELATED_TO" |
"rate" |
0 |
false |
false |
false |
"INTEGER" |
false |
null |
0 |
0 |
[] |
[] |
"relationship" |
"RELATED_TO" |
"Movie" |
2 |
false |
false |
false |
"RELATIONSHIP" |
false |
null |
1 |
2 |
["Person"] |
[] |
"relationship" |
"RELATED_TO" |
"Something" |
1 |
false |
false |
false |
"RELATIONSHIP" |
false |
null |
1 |
0 |
["Movie"] |
[] |
"relationship" |
"RELATED_TO" |
"Else" |
1 |
false |
false |
false |
"RELATIONSHIP" |
false |
null |
1 |
0 |
["Movie"] |
[] |
"relationship" |
"Person" |
"RELATED_TO" |
2 |
false |
false |
false |
"RELATIONSHIP" |
true |
null |
2 |
0 |
["Movie"] |
[] |
"node" |
"Person" |
"ACTED_IN" |
2 |
false |
false |
false |
"RELATIONSHIP" |
true |
null |
4 |
0 |
["Movie"] |
[] |
"node" |
"Person" |
"LIKES" |
2 |
false |
false |
false |
"RELATIONSHIP" |
true |
null |
4 |
1 |
["Movie", "Person"] |
[] |
"node" |
"Person" |
"born" |
0 |
false |
false |
false |
"INTEGER" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
"Person" |
"name" |
0 |
false |
false |
false |
"STRING" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
"Movie" |
"RELATED_TO" |
2 |
false |
false |
false |
"RELATIONSHIP" |
false |
null |
1 |
2 |
["Person"] |
[] |
"node" |
"Movie" |
"title" |
0 |
false |
false |
false |
"STRING" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
"Movie" |
"tagline" |
0 |
false |
false |
false |
"STRING" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
"Movie" |
"released" |
0 |
false |
false |
false |
"INTEGER" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
"Something" |
"RELATED_TO" |
1 |
false |
false |
false |
"RELATIONSHIP" |
false |
null |
1 |
0 |
["Movie"] |
[] |
"node" |
"Something" |
"foo" |
0 |
false |
false |
false |
"STRING" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
"Else" |
"RELATED_TO" |
1 |
false |
false |
false |
"RELATIONSHIP" |
false |
null |
1 |
0 |
["Movie"] |
[] |
"node" |
"Else" |
"foo" |
0 |
false |
false |
false |
"STRING" |
false |
null |
0 |
0 |
[] |
[] |
"node" |
The unique
column shows if there is an unique constraint in that specific label and property.
Similarly, the index
column show if there is an index or not, while the existence
looks for an existence constraint.
The array
column check if the row is of type array and, in case type
columns is "RELATIONSHIP"
,
the result will be true
if there is at least one node with 2 outgoing relationships with the type of relation given by label
or property
column,
so ACTED_IN
in the example above.
The left
, right
and count
columns regard only rows with column type
equals to "RELATIONSHIP"
(otherwise they are equal to 0
).
Please note that, because we examine a sample, these counts are just estimates to give an overview and proximity to actual values depends on the dataset and the sample set (default 100
).
In particular the count
column indicates the number of nodes with an outgoing relationship
(e.g. the row with label = ACTED_IN
and property = Person
has count
2 because there are 2 nodes (node:Person)-[:ACTED_IN]→()
, i.e. (Keanu)
and (TomH)
).
The left
value represents the ratio (rounded down) of the count of the outgoing patterns for a certain label and a specific type of relationship to count
.
In cypher, it corresponds to:
MATCH p=(start:`<LABEL>`)-[:`<TYPE>`]->()
WITH count(distinct start) as nodes, count(p) as counts
RETURN CASE when nodes = 0 then 0 else counts / nodes end
For example, regarding the row with label = RELATED_TO
and property = Movie
,
there are 2 relationships (:Movie)-[rel:RELATED_TO]→()
, i.e (TheMatrix)-[:RELATED_TO {rate:34}]→(Laurence)
and (TheMatrixReloaded)-[:RELATED_TO {rate:345}]→(LanaW)
.
So the left
value is 1 (2 relationships divided by 2 nodes found).
Instead, regarding the row with label = LIKES
and property = Person
, there are 8 relationships (:Person)-[rel:LIKES]→()
, 7 starting from (Keanu)
node, and 1 from (Carrie)
.
Then the left
value is 4 (8 relationships divided by 2 nodes found).
The right
value is the ratio (rounded down) of the count of the incoming patterns for a certain label and a specific type of relationship to count
,
where the patterns included in the count are those in which there is an equivalent outgoing relationship.
In cypher it corresponds to:
MATCH p=(start:`<LABEL>`)<-[:`<TYPE>`]-()
WHERE exists((start:`<LABEL>`)-[:`<TYPE>`]->())
WITH count(distinct start) as nodes, count(p) as counts
RETURN CASE when nodes = 0 then 0 else counts / nodes end
For example, regarding the row with label = RELATED_TO
and property = Movie
, the (TheMatrix)
node, which has an outgoing RELATED_TO
relationship, has 3 incoming relationships as well, while the TheMatrixReloaded
node has 1 incoming relationship. So the right
value is 2, that is 4 divided by 2 node founds.
Therefore, via the right
and left
values, we provide a dataset estimate of the possible degree averages.