{"cells":[{"cell_type":"markdown","source":["# Multidimensional graph metrics with Neo4j and Cypher\n\n","The goal of this Gist is to demonstrate the graph metrics collected in our paper [Towards the characterization of realistic models: evaluation of multidisciplinary graph metrics](https://dl.acm.org/citation.cfm?id=2976786).\n"],"metadata":{}},{"cell_type":"markdown","source":["## Dataset\n\n","As an example, we use a simple railway network consisting of two *Routes*, three *Segments*, a *Switch* and three *Sensors*.\n","\n","![railway 1](https://raw.githubusercontent.com/szarnyasg/neo4j-metrics/master/gfx/railway-1.png)\n","\n","A possible graph representation of the example as a graph is this:\n","\n","![railway 2](https://raw.githubusercontent.com/szarnyasg/neo4j-metrics/master/gfx/railway-2.png)\n","\n","This [short video](https://youtu.be/95WeVRh7SmM) demonstrates how the railway network is transformed to a graph.\n","The following query creates the graph (the query is automatically executed when loading the page).\n","In order to execute Cypher queries, make sure that the IPython extension `icypher` is installed.\n","If not, run the following command to install it:\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["pip install icypher"]},{"cell_type":"markdown","source":["Then, load the `icypher` extension:\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%load_ext icypher"]},{"cell_type":"markdown","source":["Now you’re ready to connect to your Neo4j database:\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%cypher http://user:passwd@localhost:7474/db/data"]},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","CREATE\n"," // nodes\n"," (route1:Route {name:\"Route1\"}), (route2:Route {name:\"Route2\"}),\n"," (sensorA:Sensor {name:\"SensorA\"}), (sensorB:Sensor {name:\"SensorB\"}), (sensorC:Sensor {name:\"SensorC\"}),\n"," (segment1:Segment {name:\"Segment1\"}), (segment2: Segment {name:\"Segment2\"}), (segment3: Segment {name:\"Segment3\"}),\n"," (sw:Switch {name:\"Switch\"}),\n"," (swP1:SwitchPosition {name:\"SwP1\", position:\"DIVERGING\"}), (swP2:SwitchPosition {name:\"SwP2\", position:\"STRAIGHT\"}),\n"," // requires edges\n"," (route1)-[:requires]->(sensorA),\n"," (route1)-[:requires]->(sensorB),\n"," (route1)-[:requires]->(sensorC),\n"," (route2)-[:requires]->(sensorA),\n"," (route2)-[:requires]->(sensorC),\n"," // monitoredBy edges\n"," (segment1)-[:monitoredBy]->(sensorA),\n"," (sw)-[:monitoredBy]->(sensorA),\n"," (sw)-[:monitoredBy]->(sensorC),\n"," (segment2)-[:monitoredBy]->(sensorB),\n"," (segment3)-[:monitoredBy]->(sensorC),\n"," // connectsTo edges\n"," (segment1)-[:connectsTo]->(sw),\n"," (sw)-[:connectsTo]->(segment2),\n"," (sw)-[:connectsTo]->(segment3),\n"," // target edges\n"," (swP1)-[:target]->(sw),\n"," (swP2)-[:target]->(sw),\n"," // follows edges\n"," (route1)-[:follows]->(swP1),\n"," (route2)-[:follows]->(swP2)"]},{"cell_type":"markdown","source":["In the following, we present the formal definitions of the metrics and evaluate them on the example graph using Cypher queries. For the details of the notation, see the [paper](https://dl.acm.org/citation.cfm?id=2976786).\n"],"metadata":{}},{"cell_type":"markdown","source":["## One-dimensional metrics\n\n","## Clustering coefficient\n\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)\n","OPTIONAL MATCH (v)-[r1]-(a1), (v)-[q1]-(b1)\n","WHERE a1 <> b1 AND r1 <> q1\n","WITH DISTINCT v, a1, b1\n","WITH DISTINCT v, toFloat(COUNT(a1)) AS possible\n","\n","OPTIONAL MATCH (v)-[r2]-(a2)-[]-(b2)-[q2]-(v)\n","WHERE a2 <> b2 AND r2 <> q2\n","WITH DISTINCT v, a2, b2, possible\n","WHERE possible <> 0\n","WITH DISTINCT v, COUNT(a2) AS actual, possible\n","WITH v, actual/possible AS c\n","RETURN v.name, round(10^4 * toFloat(c))/10^4 AS c\n","ORDER BY v.name"]},{"cell_type":"markdown","source":["## Multidimensional metrics\n\n","## Metrics interpreted on dimension-node pairs\n\n","## Dimensional degree\n\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)-[e]-()\n","RETURN DISTINCT type(e) AS dimension, v.name, COUNT(e) AS dd\n","ORDER BY v.name, dimension"]},{"cell_type":"markdown","source":["## Metrics interpreted on dimensions\n\n","## Node dimension activity\n\n","## Node dimension connectivity\n\n","## Edge dimension activity\n\n","## Edge dimension connectivity\n\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)\n","OPTIONAL MATCH (v)-[e]-()\n","WITH\n"," toFloat(COUNT(DISTINCT v)) AS numberOfVertices,\n"," toFloat(COUNT(DISTINCT e)) AS numberOfEdges\n","\n","MATCH (v)-[e]-()\n","WITH\n"," DISTINCT type(e) AS dimension,\n"," COUNT(DISTINCT v) AS nda,\n"," COUNT(DISTINCT v)/numberOfVertices AS ndc,\n"," COUNT(DISTINCT e) AS eda,\n"," COUNT(DISTINCT e)/numberOfEdges AS edc\n","RETURN\n"," dimension,\n"," nda,\n"," round(10^4 * toFloat(ndc))/10^4 AS ndc,\n"," eda,\n"," round(10^4 * toFloat(edc))/10^4 AS edc\n","\n","ORDER BY dimension"]},{"cell_type":"markdown","source":["## Metrics interpreted on nodes\n\n","## Node activity\n\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)-[e]-()\n","RETURN v.name, COUNT(DISTINCT type(e)) AS na\n","ORDER BY v.name"]},{"cell_type":"markdown","source":["## Multiplex participation coefficient\n\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)-[e]-()\n","WITH\n"," toFloat(COUNT(e)) AS degreeTotal,\n"," toFloat(COUNT(DISTINCT type(e))) AS numberOfDimensions\n","\n","MATCH (v)-[e]-()\n","WITH v, type(e) AS dimension, COUNT(e) AS dimensionalDegree, degreeTotal, numberOfDimensions\n","WITH v, COLLECT(dimensionalDegree) AS dimensionalDegrees, toFloat(SUM(dimensionalDegree)) AS vertexDegreeTotal, degreeTotal, numberOfDimensions\n","WITH\n"," v,\n"," numberOfDimensions / (numberOfDimensions-1) *\n"," (1 - REDUCE(deg = 0.0, x in dimensionalDegrees | deg + (x/vertexDegreeTotal)^2)) AS mpc\n","RETURN v.name, round(10^4 * toFloat(mpc))/10^4 AS mpc\n","\n","ORDER BY v.name"]},{"cell_type":"markdown","source":["## Dimensional clustering coefficients\n\n","DC1 variant\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)\n","OPTIONAL MATCH (v)-[r1]-(a1), (v)-[q1]-(b1)\n","WHERE a1 <> b1 AND type(r1) = type(q1)\n","WITH DISTINCT v, a1, b1\n","WITH DISTINCT v, toFloat(COUNT(a1)) AS possible\n","WHERE possible <> 0\n","\n","OPTIONAL MATCH (v)-[r2]-(a2)-[s2]-(b2)-[q2]-(v)\n","WHERE a2 <> b2 AND type(r2) = type(q2) AND type(r2) <> type(s2)\n","WITH DISTINCT v, a2, b2, possible\n","WITH DISTINCT v, COUNT(a2) AS actual, possible\n","WITH v, actual/possible AS dc1\n","RETURN v.name, round(10^4 * toFloat(dc1))/10^4 AS dc1\n","ORDER BY v.name"]},{"cell_type":"markdown","source":["DC2 variant\n","## Metrics interpreted on dimension pairs\n\n","The *node activity* binary vector for each node *v* is defined as:\n","Using this vector, the **pairwise multiplexity** metric is:\n"],"metadata":{}},{"cell_type":"code","execution_count":0,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[],"source":["%%cypher\n","MATCH (v)\n","WITH toFloat(COUNT(DISTINCT v)) AS numberOfVertices\n","MATCH (v)-[e]-()\n","WITH DISTINCT numberOfVertices, type(e) AS d1\n","MATCH (v)-[e]-()\n","WITH DISTINCT numberOfVertices, d1, type(e) AS d2\n","OPTIONAL MATCH ()-[e1]-(v)-[e2]-()\n","WHERE type(e1) = d1 AND type(e2) = d2\n","WITH DISTINCT numberOfVertices, d1, d2, v\n","WITH DISTINCT d1, d2, COUNT(v)/numberOfVertices AS pairwise_multiplexity\n","RETURN d1, d2, round(10^4 * toFloat(pairwise_multiplexity))/10^4 AS pairwise_multiplexity\n","ORDER BY d1, d2"]}],"metadata":{"language_info":{"name":"python","version":"3.9.1"}},"nbformat":4,"nbformat_minor":4}