The Cosmic Web Paper by the Barabasi Lab
After I came across this tweet the other night,
Turns out the universe is just a very big graph database: https://t.co/tKkTLnNpqK pic.twitter.com/hh4s5pKWPI
— David J Carr (@djc1805) April 22, 2016
I checked out the original website of Cosmic Web, which is beautifully done.
Their paper describes the work of correlating galaxies in our cosmos by different means.
- Fixed-Length Model: All galaxies within a set distance of
l
are connected by an undirected link. - Varying-Length Model: The length of each link is proportional to the “size” of the galaxy,
l = a * R(i) ^ (1/2)
- Nearest Neighbors Model: Each galaxy is connected to its closest neighbors with a directed links. In this model, the length of each link depends on the distance to the nearest galaxy.
The last model provided the most accurate representation of the real-world constellations.
Graph Visualization
A visual artist, Kim Albrecht, visualized the resulting graphs beautifully using Three.js.
Working with Raw CSV Data
Fortunately for me, the raw sources for this dataset were CSV files with the galaxies forming nodes and the different relationship types that represent the means for connecting them described in their research.
I had four CSV files to work with:Importing Data into Neo4j 3.0
With Neo4j 3.0, I could quickly import them using the LOAD CSV
mechanism, here is the full script.
create constraint on (g:Galaxy) assert g.id is unique; // create galaxies with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15//ccnr-universe-nodes-nn.csv" as nodes load csv with headers from nodes as row with collect(row) as rows unwind range(0,size(rows)-1) as id create (g:Galaxy {id:id}) set g+=rows[id]; // Fixed Length Model with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15/ccnr-universe-fll-t-1-15.csv" as relationships load csv with headers from relationships as row match (g1:Galaxy {id:toInt(row.source)}),(g2:Galaxy {id:toInt(row.target)}) create (g1)-[:FLL]->(g2); // Varying Length Model with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15/ccnr-universe-vll-t-1-10.csv" as relationships load csv with headers from relationships as row match (g1:Galaxy {id:toInt(row.source)}),(g2:Galaxy {id:toInt(row.target)}) create (g1)-[:VLL]->(g2); // Nearest Neighbors Model with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15/ccnr-universe-nn-t-1-10.csv" as relationships load csv with headers from relationships as row match (g1:Galaxy {id:toInt(row.source)}),(g2:Galaxy {id:toInt(row.target)}) create (g1)-[:NN]->(g2);
The only trick I had to pull of was to collect the galaxies first into a list, to get an index for their row in the CSV. That’s why loading the node-CSV takes longer than the relationships.
Query & Visualize in the Neo4j Browser
But running the import gives me some nice visual results in the Neo4j Browser.
MATCH (g:Galaxy) WHERE size( (g)--() ) = 10 WITH g LIMIT 1 MATCH (g)-[rels:NN*..7]-() UNWIND rels as r RETURN distinct r;
Neo4j 3.0 Bolt Binary Protocol Test
With Neo4j 3.0, I wanted to test the performance of the new binary protocol (a.k.a. Bolt). So I grabbed the JavaScript [neo4j-driver from npm], and retrieved all 211k neighbourhood relationships in one go. Just pulling the data and measuring the outcome is easy, as you can see below.
test-neo-driver.jsvar neo4j = require('neo4j-driver').v1; var driver = neo4j.driver("bolt://localhost", neo4j.auth.basic("neo4j", "test")); var session = driver.session(); var counter = function() { var start = undefined; return { start : Date.now(), count : 0, onNext: function(r) { this.count++; }, onCompleted: function() { console.log("rows",this.count,"took",(Date.now()-start),"ms"); }} }; session.run("CYPHER runtime=compiled MATCH (n:Galaxy)-[:NN]->(m:Galaxy) RETURN id(n),id(m)").subscribe(counter());
NOTE:
|
It interestingly took only 330ms to pull all that data out of the database and across the wire into my client. |
test run
$ npm install neo4j-driver $ node test-neo-driver.js > rows 211959 took 327 ms
Force Layout Graph Visualization with ngraph
Although I have no artistic talents whatsoever, I could at least try to load the data from Neo4j into Anvaka’s ngraph and let its force layout algorithm do the work.
Please note that the artistic three.js visualization mentioned above uses pre-laid-out data, the x
, y
, z
coordinates are still available as properties in the data.
But I wanted to see how well ngraph can load and layout 200k relationships without any preparation just in JavaScript.
The loading was quite quick, like before. The force-layout did take some time though, but resulted in a really nice two-dimensional graph rendering of our cosmos.
Everyone can import the data on their own quickly by running my import script, after starting your Neo4j 3.0 server.
(You might need to confige in conf/neo4j.config
that the “remote-shell” is enabled.)
$NEO4J_HOME/bin/neo4j-shell -file galaxies.cypher
Conclusion
This was only me having fun with galaxies and Neo4j 3.0 around 3 a.m. If you want to read and hear from a real graph-astronomer, check out Caleb W. Jones’ work on “Using Neo4j to Take Us to the Stars”.
Ad Astra,
Michael
P.S. Graphs are everywhere – even our cosmos form one.
Want to try this out for yourself? Click below to download Neo4j 3.0 and test it out for your next project or application.