GraphGists

Santa Claus is coming to... the Galaxy

The mission

A long time ago…​ Santa wasn’t really busy at Earth, and, as we may imagine, the force is strong in Santa. He therefore had the time to transport himself to a Galaxy Far Far away around the year 14 BBY (Before the Battle of Yavin).

His mission was to deliver presents to two 5 year old twins for whom the force ran strong in their family: Leia Organa in Alderaan and Luke Skywalker in Tatooine.

Unfortunately, there was only one map of The Galaxy stored in an android. After a thorough search, Santa found the android, but it wasn’t the android he was looking for, this android only possesed a Graph Database of the Galaxy.

The model

After a brief analysis, Santa was able to infere the model used to create the Database, which is depicted in the following picture:

view

This model allowed Santa to grasp a lot of intellingence about his mission.

Motivation

Plot aside, I would like to explain my main motivation for this Graphgist. Gathering data isn’t quite difficult in modern times. Gathering usable data takes more work but can be done. However, finding relationships among data is sometimes quite hard and it is harder to exploit these relationships once they’ve been found. Not all relationships are obvious and computers aren’t better than humans to spot new patterns and behaviours (at least for now).

We need to team up with computers in order to really gather useful intelligence and profit of the relationships among data. Graph databases are a natural way to model data and to allow people to effectively profit of computers for both visualization and data analysis. My main goal here is to show how much can we do with a very simple model and basic Cypher queries.

What if you see a map differently? And what if you navigate through it with different metrics and different approaches to explore not only the topology but the dynamics of the dataset?

The data

The data was gathered from the Wookiepedia and the Galaxy Map (an excelent job by @gorrrgoth). For demonstration purposes I took a subset of 46 planets that provide some interesting dynamics. The graph is the following:

The following three sections will explain how can Santa find the best path through the Galaxy to accomplish his mission by teaming with the Graph Database power and the force…​

Exploring the Galaxy

Just by looking at the model, Santa knows that a Planet BELONGS_TO a region, that a sector CONTAINS planets, that a Planet is AFFILIATED_TO a Political System and that planets are NEIGHBOR_OR or are ADJACENT_TO other planets via plain space or Hyperspace Routes, respectively. However, he also knows that these relationships don’t necesarily exist among all the entities in the data set.

The first thing to explore is the topology of the galaxy.

The Topology

Santa can know basic things as the number of planets, sectors an regions with very simple and basic queries as:

MATCH (p:Planet) RETURN count(p) as Planets

In the same way he can know there are 13 Sectors, 7 Regions and 6 Political Systems.

The next interesting thing he can ask himself is if there are any isolated planets:

MATCH (p:Planet) WHERE NOT ()-[:ADJACENT_TO|NEIGHBOR_OF]-(p)
RETURN p as IsolatedPlanets

There aren’t. Therefore, every planet is attainable via hyperspace or a reasonable short route via plain space. Next, he can know the five most populated sectors:

MATCH (s:Sector)-[:CONTAINS]->(p:Planet)
RETURN s.name as Sector, count (p) as No_of_Planets
ORDER BY No_of_Planets DESC LIMIT 5

And the 5 longer Hyperspace Routes:

MATCH ()-[r:ADJACENT_TO]->()  WITH r.hyperspaceroute AS routes
RETURN DISTINCT(routes) as HyperSpaceroute, count(routes) as length
ORDER BY length DESC LIMIT 5

Which planets belong to the same Sector as Tatooine’s?

MATCH (s:Sector)-[:CONTAINS]->(p:Planet{name: 'Tatooine'})
MATCH (p1:Planet)<-[:CONTAINS]-(s) WHERE p<>p1
RETURN p1.name as Planet

Santa can easily discover that evey planet belongs to a region, but not all planets are contained in a Sector. In which regions are these planets located?

MATCH (p:Planet)-[:BELONGS_TO]->(r:Region)
WHERE NOT ()-[:CONTAINS]->(p)
RETURN DISTINCT r.name as Region, count(p) as Number_of_Planets
ORDER BY Number_of_Planets DESC

Which are the best connected planets?

MATCH (p:Planet)-[rels:ADJACENT_TO|NEIGHBOR_OF*1]->()
RETURN p.name as Planet, count(rels) as Number_of_Connections ORDER BY Number_of_Connections DESC LIMIT 5

Since Tatooine is highly connected, and belong to the most populated sector Santa decides to start in Tatooine and then to find its way to Alderaan

The shortest path from Tatooine to Alderaan using either the Hyperspace or plain Space is the following:

MATCH p=shortestPath((p1:Planet{name: 'Tatooine'})-[r:ADJACENT_TO|NEIGHBOR_OF*]-(p2:Planet{name:'Alderaan'}))
RETURN p

The Dynamics

Once Santa has a grasp of the topology of the Galaxy, he can start understanding its dynamics. With similar queries he can easily know that there are different Political Systems in the Galaxy. Here the power of graph databases can help us to know, for example, which is the dominating political force?

MATCH (p:Planet)-[:AFFILIATED_TO]->(ps:Political_System)
RETURN distinct ps.system as Political_System, count(p) as Affiliated_Planets
ORDER BY Affiliated_Planets DESC

Which are the most politicaly stable/diverse regions?

MATCH (r:Region)<-[:BELONGS_TO]-()-[:AFFILIATED_TO]->(ps:Political_System)
RETURN r.name as Region, count(DISTINCT ps) as No_of_Political_Systems
ORDER BY No_of_Political_Systems

In which sectors and regions is the Galactic Empire better stablished?

MATCH (r:Region)<-[:BELONGS_TO]-(p:Planet)<-[:CONTAINS]-(s:Sector)
WHERE (p)-[:AFFILIATED_TO]->(:Political_System {system: 'Galactic Empire'})
RETURN distinct r.name as Region, s.name as Sector, count(distinct p) as Planets ORDER BY Planets DESC

Interacting with the graph

Now that Santa has a basic knowledge of the can now interact with the graph using Cypher. However, everything indicates he will have to deal with the Galactic Empire. After a little research he finds out he has to pay a tax of 100 Imperial Credits for every planet affiliated to the Galactic Empire he find in his path.

Which is the smallest path using the total taxes as the main metric?

First he can set the "tax" property in every planet according to their political affiliation.

MATCH (p:Planet) SET p.tax=0
MATCH (p:Planet)-[:AFFILIATED_TO]->(:Political_System {system:'Galactic Empire'})
SET p.tax=100

And now calculate the shortest path according to the tax metric:

MATCH path=(p1:Planet{name: 'Tatooine'})-[r:ADJACENT_TO|NEIGHBOR_OF*1..15]-(p2:Planet{name:'Alderaan'})
RETURN reduce(tax=0, p in nodes(path) | tax+p.tax) AS TotalTax, [p in nodes(path) | p.name] as Planet, [p in nodes(path) | p.tax] as Tax
    ORDER BY TotalTax ASC
    LIMIT 1

(I limited this query for paths of length 15 or smaller due to the Gist server limitations)

Santa can note that the path is longer (14 planets compared to 12 he found above) but total tax is smaller. He can easily verify this by calculating the total tax for the shortest path queried before:

MATCH path=shortestpath((p1:Planet{name: 'Tatooine'})-[r:ADJACENT_TO|NEIGHBOR_OF*]-(p2:Planet{name:'Alderaan'}))
RETURN reduce(tax=0, p in nodes(path) | tax+p.tax) AS TotalTax, [p in nodes(path) | p.name] as Planet, [p in nodes(path) | p.tax] as Tax
    ORDER BY TotalTax ASC
    LIMIT 1

These metrics can be set in the relationships as well, for example. The amont of fuel needed. We can assume that fuel needed for hyperspace jumps among adjacent planets is 10 times larger than for nearby planets in plain space.

He can therefore set the following properties in relationships:

MATCH ()-[r:ADJACENT_TO]->() SET r.fuel=10
MATCH ()-[r:NEIGHBOR_OF]->() SET r.fuel=1

And to use this new property to find shortest paths according to the fuel metric but keeping in mind the tax as well:

MATCH path=(p1:Planet{name: 'Tatooine'})-[r:ADJACENT_TO|NEIGHBOR_OF*1..15]-(p2:Planet{name:'Alderaan'})
RETURN reduce(fuel = 0, r in relationships(path)| fuel+r.fuel) AS TotalFuel, reduce(tax=0, p in nodes(path) | tax+p.tax) AS TotalTax, [p in nodes(path) | p.name] as Planet
    ORDER BY TotalTax ASC
    LIMIT 3

For some paths Santa would have to pay the same amount of tax, but he can easily make up his mind one according to the one where the amount of fuel is the smallest.

Now, he can be prepared to sudden changes in the Galaxy dynamics. For example, if the Empire suddenly discovers that Commenor doesn’t tax for the system and stablishes a blockage to it. Santa can find his way around Commenor with the following query and always keeping in mind to keep the tax at its smallest:

MATCH (commenor:Planet {name:'Commenor'}), path=(p1:Planet{name: 'Tatooine'})-[r:ADJACENT_TO|NEIGHBOR_OF*1..15]-(p2:Planet{name:'Alderaan'})
WHERE NONE(p IN nodes(path) WHERE p=commenor)
RETURN reduce(fuel = 0, r in relationships(path)| fuel+r.fuel) AS TotalFuel, reduce(tax=0, p in nodes(path) | tax+p.tax) AS TotalTax, [p in nodes(path) | p.name] as Planet
    ORDER BY TotalTax ASC
    LIMIT 1

Let’s suppose Santa has some friends in the Jedi Order, and that he wants to visit a planet affiliated to this Political System, he can still optimize his way adding this condition with the following query:

MATCH path=(p1:Planet{name: 'Tatooine'})-[r:ADJACENT_TO|NEIGHBOR_OF*1..15]-(p2:Planet{name:'Alderaan'})
WHERE ANY(p IN nodes(path) WHERE (p)-[:AFFILIATED_TO]-(:Political_System{system: 'Jedi Order'}))
RETURN reduce(fuel = 0, r in relationships(path)| fuel+r.fuel) AS TotalFuel, reduce(tax=0, p in nodes(path) | tax+p.tax) AS TotalTax, [p in nodes(path) | p.name] as Planet
    ORDER BY TotalTax ASC
    LIMIT 3

Santa can keep playing around with very simple queries to gather intelligence about his trip. Using metrics as time travel through hyperspace and plain space, if he has any friends that can help him with Imperial Credits at some planets, etc. May the Graph be with him.

An image of the portion of the Galaxy found in Santa’s Graph Database is the following:

thumbnail?id=0B17R8cRB8tsoRHVPUVBza2hkRTg

Application

As a physical security professional I am interested in tomorrow’s security systems. We cannot longer ignore the fact that physical and cyber world are getting closer and that security specialists must think holistically. Graph databases is an excelent way of doing this.

I recently found a great thesis authored by Carmen Cheh called The Cyber-Physical Topology Language: Definition and Operations where she develops an ontology for the modeling of cyber-physical systems. My galaxy model is a baby example of her work, where you have cyber paths (hyperspace routes) and physical paths (plain space). These objects have properties and are related with other objects in a similar way as the model depicted in this Gist.

Buildings and organizations have different systems that can be modeled as a graph, but these systems aren’t standardized enough as to model the in a unique way. Teaming up with computers we can find critical assets, penetration times, attack metrics, etc. Without being a computer expert I find Neo4j the most useful tool for achieving good security designs.


I would like to leave you with an image of the early planning of this gist…​

thumbnail?id=0B17R8cRB8tsoTmpLMW5OMmZtUEp1Z0pYZzFzcFhfOXJ0SUpF

May the Force be with you.