NoSQL - How to generate histograms for ranges of data
Question
Our company has a need to store and compute analytics related to content creation, review/approval and publishing workflow for documents. We are looking at something like Amazon SimpleDB.
We will store "events" which correspond to actions that users take in the system. For instance:
[User B] requested [document B] be reviewed at [Time] by [User A] [User A] approved [document B] at [Time] [User B] edited [document B] at [Time] [User B] published [document B] at [Time]
Then we want to be able to create graphs (histogram/line plot) of this activity for given time periods. For instance:
-
Edits vs Time
-
Approvals vs Time
-
Publishes vs Time
-
Approvals vs Publishes vs Time
In SQL I assume this would be done by grouping results into "buckets". However, I am having a hard time figuring out how to do this with a NoSQL db like AWS Simpledb without batching this processing using Hadoop/Map Reduce. This has to be realtime so doing any batch processing is out of the question.
We are also looking at Neo4J so if someone has a solution for Neo I would be interested as well.
Thanks
Edits per document and day
MATCH (c:Content)<-[:OF_CONTENT]-(e:Edit)-[:ON_DAY]->(d)
RETURN d.day, count(e), collect({edit:id(e), content:c.document})
Is this page helpful?