GraphGists

NoSQL - How to generate histograms for ranges of data

Question

Our company has a need to store and compute analytics related to content creation, review/approval and publishing workflow for documents. We are looking at something like Amazon SimpleDB.

We will store "events" which correspond to actions that users take in the system. For instance:

[User B] requested [document B] be reviewed at [Time] by [User A]
[User A] approved [document B] at [Time]
[User B] edited [document B] at [Time]
[User B] published [document B] at [Time]

Then we want to be able to create graphs (histogram/line plot) of this activity for given time periods. For instance:

  • Edits vs Time

  • Approvals vs Time

  • Publishes vs Time

  • Approvals vs Publishes vs Time

In SQL I assume this would be done by grouping results into "buckets". However, I am having a hard time figuring out how to do this with a NoSQL db like AWS Simpledb without batching this processing using Hadoop/Map Reduce. This has to be realtime so doing any batch processing is out of the question.

We are also looking at Neo4J so if someone has a solution for Neo I would be interested as well.

Thanks

Edits per day

MATCH (e:Edit)-[:ON_DAY]->(d)
RETURN d.day, count(e), collect(e)

Edits per document and day

MATCH (c:Content)<-[:OF_CONTENT]-(e:Edit)-[:ON_DAY]->(d)
RETURN d.day, count(e), collect({edit:id(e), content:c.document})