Neo4j’s query language Cypher supports loading data from CSV directly but not from JSON files or URLs.

Almost every site offers some kind of API or endpoint that returns JSON and we can also query many NOSQL databases via HTTP and get JSON responses back.

It’s quite useful to be able to ingest document structured information from all those different sources into a more usable graph model.

I want to show here that retrieving that data and ingesting it into Neo4j using Cypher is really straightforward and takes only little effort.

As Cypher is already pretty good at deconstructing nested documents, it’s actually not that hard to achieve it from a tiny program.

I want to show you today how you can achieve this from Python, Javascript, Ruby, Java, and Bash.

The Domain: Stack Overflow


Being a developer I love Stack Overflow; just crossed 20k reputation by only answering 1100 Neo4j-related questions :). You can do that too. That’s why I want to use Stack Overflow users with their questions, answers, comments and tags as our domain today.

Loading JSON Data from a Stack Overflow URL


Pulling Stack Overflow information into a graph model allows me to find interesting insights, like:

    • What are the people asking or answering about Neo4j also interested in
    • How is their activity distributed across tags and between questions, answers and comments
    • Which kinds of questions attract answers and which don’t
    • Looking at my own data, which answers to what kinds of questions got the highest approval rates
We need some data and a model suited to answer those questions.

Stack Overflow API


Stack Overflow offers an API to retrieve that information, it’s credential protected as usual, but there is the cool option to pre-generate an API-URL that encodes your secrets and allows you to retrieve data without sharing them. You can still control some parameters like tags, page size and page-number though.

With this API-URL below, we load the last 10 questions with the Neo4j tag.

https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf

The response should look something like this (or scroll to the far bottom).

Overall Response Structure
{ "items": [{
	"question_id": 24620768,
	"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
	"title": "Neo4j cypher query: get last N elements",
 	"answer_count": 1,
 	"score": 1,
 	.....
 	"creation_date": 1404771217,
 	"body_markdown": "I have a graph....How can I do that?",
 	"tags": ["neo4j", "cypher"],
 	"owner": {
 		"reputation": 815,
 		"user_id": 1212067,
        ....
 		"link": "http://stackoverflow.com/users/1212067/"
 	},
 	"answers": [{
 		"owner": {
 			"reputation": 488,
 			"user_id": 737080,
 			"display_name": "Chris Leishman",
            ....
 		},
 		"answer_id": 24620959,
 		"share_link": "http://stackoverflow.com/a/24620959",
        ....
 		"body_markdown": "The simplest would be to use an ... some discussion on this here:...",
 		"title": "Neo4j cypher query: get last N elements"
 	}]
 }

Graph Model


So what does the graph-model look like? We can develop it by looking at the questions we want to answer and the entities and relationships they refer to.

We need this model upfront to know where to put our data when we insert it into the graph. After all we don’t want to have loose ends.

Discover How to LOAD JSON Files from URLs AS Graph-Ready Data


Cypher Import Statement


The Cypher query to create that domain is also straightforward. You can deconstruct maps with dot notation map.key and arrays with slices array[0..4]. You’d use UNWIND to convert collections into rows and FOREACH to iterate over a collection with update statements. To create nodes and relationships we use MERGE and CREATE commands.

My friend Mark just published a blog post explaining in detail how you apply these operations to your data.

The JSON response that we retrieved from the API call is passed in as a parameter {json} to the Cypher statement, which we alias with the more handy data identifier. Then we use the aforementioned means to extract the relevant information out of the data collection of questions, treating each as q.

For each question we access the direct attributes but also related information like the owner or contained collections like tags or answers which we deconstruct in turn.

WITH {json} as data
UNWIND data.items as q
MERGE (question:Question {id:q.question_id}) ON CREATE
  SET question.title = q.title, question.share_link = q.share_link, question.favorite_count = q.favorite_count

MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)

FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
FOREACH (a IN q.answers |
   MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
   MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
   MERGE (answer)<-[:PROVIDED]-(answerer)
)

Calling Cypher with the JSON parameters


To pass in the JSON to Cypher we have to programmatically call the Cypher endpoint of the Neo4j server, which can be done via one of the many drivers for Neo4j or manually by POSTing the necessary payload to Neo4j. We can also call the Java API.

So without further ado here are our examples for a selection of different languages, drivers and APIs:

Python

We use the py2neo driver by Nigel Small to execute the statement:

import os
import requests
from py2neo import neo4j

# Connect to graph and add constraints.
neo4jUrl = os.environ.get('NEO4J_URL',"http://localhost:7474/db/data/")
graph = neo4j.GraphDatabaseService(neo4jUrl)

# Add uniqueness constraints.
neo4j.CypherQuery(graph, "CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE;").run()

# Build URL.
apiUrl = "https://api.stackexchange.com/2.2/questions...." % (tag,page,page_size)
# Send GET request.
json = requests.get(apiUrl, headers = {"accept":"application/json"}).json()

# Build query.
query = """
UNWIND {json} AS data ....
"""

# Send Cypher query.
neo4j.CypherQuery(graph, query).run(json=json)

We also did something similar with getting tweets from the Twitter search API into Ne4oj for the OSCON conference.

Javascript

For JavaScript I want to show how to call the transactional Cypher endpoint directly, by just using the request node module.

var r=require("request");
var neo4jUrl = (env["NEO4J_URL"] || "http://localhost:7474") + "/db/data/transaction/commit";

function cypher(query,params,cb) {
  r.post({uri:neo4jUrl,
          json:{statements:[{statement:query,parameters:params}]}},
         function(err,res) { cb(err,res.body)})
}

var query="UNWIND {json} AS data ....";
var apiUrl = "https://api.stackexchange.com/2.2/questions....";

r.get({url:apiUrl,json:true,gzip:true}, function(err,res,json) {
  cypher(query,{json:json},function(err, result) { console.log(err, JSON.stringify(result))});
});

Java

With Java I want to show how to use the Neo4j embedded API to execute Cypher.

import org.apache.http.*;
import org.codehaus.jackson.map.ObjectMapper;
import org.neo4j.graphdb.*;

// somewhere in your application-scoped setup code
ObjectMapper mapper = new ObjectMapper();
HttpClient http = HttpClients.createMinimal();
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedGraphDatabase(PATH);

// execute API request and parse response as JSON
HttpResponse response = http.execute(new HttpGet( apiUrl ));
Map json = mapper.readValue(response.getEntity().getContent(), Map.class)

// execute Cypher
String query = "UNWIND {json} AS data ....";
db.execute(query, singletonMap("json",json));

// application scoped shutdown, or JVM-shutdown-hook
db.shutdown();

Ruby

Using the neo4j-core Gem, we can talk to Neo4j Server or embedded (using jRuby) by just changing a single line of configuration.

require 'rubygems'
require 'neo4j-core'
require 'rest-client'
require 'json'

QUERY="UNWIND {json} AS data ...."
API = "https://api.stackexchange.com/2.2/questions...."

res = RestClient.get(API)
json = JSON.parse(res.to_str)

session = Neo4j::Session.open
session.query(QUERY, json: json)

Bash

Bash is of course most fun, as we have to do fancy text substitutions to make this work.

load_json.sh
#!/bin/bash
echo "Usage load_json.sh 'http://json.api.com?params=values' import_json.cypher"
echo "Use {data} as parameter in your query for the JSON data"
JSON_API="$1"
QUERY=`cat "$2"` # cypher file
JSON_DATA=`curl --compress -s -H accept:application/json -s "$JSON_API"`
POST_DATA="{\"statements\":[{\"statement\": \"$QUERY\", \"parameters\": {\"data\":\"$JSON_DATA\"}}]}"
DB_URL=${NEO4J_URL-http://localhost:7474}
curl -i -H accept:application/json -H content-type:application/json -d "$POST_DATA" -XPOST 
"$DB_URL/db/data/transaction/commit"

Example Use-Cases


Here are some simple example queries that I now can run on top of this imported dataset.

To not overload this blog post with too much information, we’ll answer our original questions in Part 2.

Find the User who was most active


MATCH (u:User)
OPTIONAL MATCH (u)-[:PROVIDED|ASKED|COMMENTED]->()
RETURN u,count(*)
ORDER BY count(*) DESC
LIMIT 5

Find co-used Tags


MATCH (t:Tag)
OPTIONAL MATCH (t)<-[:TAGGED]-(question)-[:TAGGED]->(t2)
RETURN t.name,t2.name,count(distinct question) as questions
ORDER BY questions DESC

MATCH (t:Tag)<-[r:TAGGED]->(question)
RETURN t,r,question

Conclusion


So as you can see, even with LOAD JSON not being part of the language, it’s easy enough to retrieve JSON data from an API endpoint and deconstruct and insert it into Neo4j by just using plain Cypher.

Accessing web-APIs is a simple task in all stacks and languages and JSON as transport format is ubiquitous.

Fortunately, the unfortunately lesser known capabilities of Cypher to deconstruct complex JSON documents allow us to quickly turn them into a really nice graph structure without duplication of information and rich relationships.

I encourage you to try it with your favorite web-APIs and send us your example with graph model, Cypher import query and 2-3 use-case queries that reveal some interesting insights into the data you ingested to content@neotechnology.com.


Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and start building better apps powered by graph technologies.

Get My Free Copy


Appendix: Stack Overflow Response


{
	"items": [{
		"answers": [{
			"owner": {
				"reputation": 488,
				"user_id": 737080,
				"user_type": "registered",
				"accept_rate": 45,
				"profile_image": "https://www.gravatar.com/avatar/
ffa6eed1e8a9c1b2adb37ca88c07dede?s=128&d=identicon&r=PG",
				"display_name": "Chris Leishman",
				"link": "http://stackoverflow.com/users/737080/chris-leishman"
			},
			"tags": [],
			"comment_count": 0,
			"down_vote_count": 0,
			"up_vote_count": 2,
			"is_accepted": false,
			"score": 2,
			"last_activity_date": 1404772223,
			"creation_date": 1404772223,
			"answer_id": 24620959,
			"question_id": 24620768,
			"share_link": "http://stackoverflow.com/a/24620959",
			"body_markdown": "The simplest would be to use an ... some discussion on this here:
 http://docs.neo4j.org/chunked/stable/cypherdoc-linked-lists.html)",
			"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements/24620959#24620959",
			"title": "Neo4j cypher query: get last N elements"
		}],
		"tags": ["neo4j", "cypher"],
		"owner": {
			"reputation": 815,
			"user_id": 1212067,
			"user_type": "registered",
			"accept_rate": 73,
			"profile_image": "http://i.stack.imgur.com/nnyS1.png?s=128&g=1",
			"display_name": "C&#233;sar Garc&#237;a Tapia",
			"link": "http://stackoverflow.com/users/1212067/c%c3%a9sar-garc%c3%ada-tapia"
		},
		"comment_count": 0,
		"delete_vote_count": 0,
		"close_vote_count": 0,
		"is_answered": true,
		"view_count": 14,
		"favorite_count": 0,
		"down_vote_count": 0,
		"up_vote_count": 1,
		"answer_count": 1,
		"score": 1,
		"last_activity_date": 1404772230,
		"creation_date": 1404771217,
		"question_id": 24620768,
		"share_link": "http://stackoverflow.com/q/24620768",
		"body_markdown": "I have a graph that...How can I do that?",
		"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
		"title": "Neo4j cypher query: get last N elements"
	}, {
		"tags": ["neo4j", "cypher"],
		"owner": {
			"reputation": 63,
			"user_id": 845435,
			"user_type": "registered",
			"accept_rate": 67,
			"profile_image": "https://www.gravatar.com/avatar/
610458a30958c9d336ee691fa1a87369?s=128&d=identicon&r=PG",
			"display_name": "user845435",
			"link": "http://stackoverflow.com/users/845435/user845435"
		},
		"comment_count": 0,
		"delete_vote_count": 0,
		"close_vote_count": 0,
		"is_answered": false,
		"view_count": 16,
		"favorite_count": 0,
		"down_vote_count": 0,
		"up_vote_count": 0,
		"answer_count": 0,
		"score": 0,
		"last_activity_date": 1404768987,
		"creation_date": 1404768987,
		"question_id": 24620297,
		"share_link": "http://stackoverflow.com/q/24620297",
		"body_markdown": 
"I&#39;m trying to implement a simple graph db for NYC subway................Thanks!\r\n",
		"link": "http://stackoverflow.com/questions/24620297/cypher-query-with-infinite-relationship-takes-forever",
		"title": "Cypher query with infinite relationship takes forever"
	}],
	"has_more": true,
	"quota_max": 300,
	"quota_remaining": 205
}

 

Keywords:  


About the Author

Michael Hunger, Developer Relations

Michael Hunger Image

Michael Hunger has been passionate about software development for a very long time. For the last few years he has been working on the open source Neo4j graph database filling many roles.

As caretaker of the Neo4j community and ecosystem he especially loves to work with graph-related projects, users and contributors. As a developer, Michael enjoys many aspects of programming languages, learning new things every day, participating in exciting and ambitious open source projects and contributing and writing software related books and articles.


8 Comments

Marko Dragoljevic says:

Hi there.

I tried to make Bash example work with Neo4j 2.2.3 Community Edition from my MacBook Pro and it is not working.

I get the following error from Neo4j: {“results”:[],”errors”:[{“code”:”Neo.ClientError.Request.InvalidFormat”,”message”:”Unable to deserialize request: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value\n at [Source: HttpInputOverHTTP@190a6a94; line: 1, column: 51]”}]}

I trimmed JSON size to only 5 posts since was getting an error when using 100 that it was too big.

Any idea?

I am trying to do something else and this example is perfect example of what I need, so first need to get it working to move to my actual work.

In my work for same concept: JSON that gets injected as parameter in Cypher query I get: {“errors”: [{“code”: “Neo.ClientError.Request.InvalidFormat”, “message”: “Unable to deserialize request: Unexpected character (‘e’ (code 101)): was expecting comma to separate OBJECT entries\n at [Source: HttpInputOverHTTP@6781a645; line: 1, column: 104]”}], “results”: []} where ‘e’ is just first character inside my JSON document.

Marko Dragoljevic says:

Solved it and it works well.

There were 3 issues:
1. New lines in file containing Cyper queries were causing issues so I ended up with one gigantic line with spaces between commands. I guess Cypher parser expects this on the other side.
2. All JSON data needs to have escaped “. So I guess some data clean-up is needed before this JSON is used in BASH. I use my own JSON file I have cleaned up as pre-requisit.
3. Removed \” around $JSON_DATA in POST_DATA since data inside $JSON_DATA starts with { and no \” needed around.

Jesus Manual says:

Spring Boot, the new convention-over-configuration centric framework from the Spring team at Pivotal, marries Spring’s flexibility with conventional, common sense defaults to make application development not just fly,but pleasant!

Bob Neron says:

Hello

I am trying to compile the java example and but get an error on the following statement:
HttpClient http = HttpClients.createMinimal();

After searching on the web I found the variable type could be CloseableHttpClient instead of HttpClient .
but even with this change the list of imports and jars I supplied in the CLASSPATH do seem to be sufficient to resolve HttpClients.createMinimal() function.

I added more imports without any more success to compile the code.
import org.apache.http.impl.client.*;
import javax.net.*;

Any hint much appreciated.

Bob

Ahmed says:

Hi,
I am trying to adjust the tutorial to the newer py2neo version. It seems the syntax has changed significantly and now the following line does not work anymore:
neo4j.CypherQuery(graph, query).run(json=json)

I have tried this:
results = graph.cypher.run(query,json=json)

However, it didn’t work and produced more errors. Any suggestions on how to change the code to adjust to the new syntax?

Abhijit says:

Hi All,

I am not sure whether this is the right place to ask this but had an urgent query so asking here and expecting some clue.

I am using Neo4J 3.0.6

I need to get the JSON data for a relationship from my java code but unfortunately I am not getting now. Any pointer on this please.

Regards,
Abhijit

Tanya Murphy says:

Thank you for the intuitive StackOverflow example.

Where is the distinction between “answered” and “edited” (or “asked” and “edited”) encoded?

Best regards,
Tanya

@micahstubbs says:

thanks for the post!

I’ve made a working demo project using the Javascript example https://github.com/micahstubbs/neo4j-json-demo

the main contribution of this demo project is showing the auth header. The complete query and api link are also show, so that you can just run the script without any addition copy-pasting or guesswork 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Upcoming Event

 

From the CEO

Emil's Blog


Have a Graph Question?

Stackoverflow
Slack
Contact Us

Share your Graph Story?

Email us: content@neotechnology.com


Popular Graph Topics