Neo4j, Cassandra, and FluidDB represent a breed of databases that swiftly search social networking data. What you should know about each of them


Everyone loves social networks these days. Homeland security wants to track which terrorists know one another. Laundry companies want to know your friends so that they can get you to pass along the good word about the new starch. Content mongers, meanwhile, believe that they can link together similar movie, television, and music preferences among users so that people who love “Die Hard” can be automatically informed that they might want to check out “Die Hard 2: Die Harder.” At the heart of these problems and dozens of other ideas springing from the forehead of marketing directors everywhere are graph databases. (Computer scientists use the word “graph” to describe collections of objects and the links among them.) Using graph databases instead of traditional relational databases to store social-network-type data structures can yield faster answers to important questions — such as what kind of donut your friend’s friend’s friend prefers, or whether someone from your “Labor Day” DVD was in a movie with someone who was in a movie with Kevin Bacon. Pure graph databases aren’t the only answer. Simple, schema-free relational databases are emerging that rival the capabilities of graph databases in swiftly cranking out quick answers to the aforementioned questions. They achieve this feat by not wasting time fretting over transactions, instead focusing on pre-computing answers. Moreover, certain types of simple queries that might be required of a social network are better suited for the indexing and table balancing built into a relational database. For example, if you have links between people stored in an indexed table with the ID numbers in two columns, it’s easy for a relational database to find everyone who is a friend of Bob or everyone who follows Chris. The graph structure doesn’t really help with such queries. I looked at three databases, both graph and relational, that are geared toward social networks: Neo4j, Cassandra, and FluidDB. All three are relatively young but hold promise in helping organizations connect the dots among their user base.  

Keywords: