As we work to resolve these issues, a key factor in successfully assembling a comprehensive tree is finding unique and intuitive ways to illustrate relationships between the species. Rick Ree (Field Museum of Natural History, University of Chicago), Stephen Smith (University of Michigan), and Mark Holder (University of Kansas) from our group are using something called “graph database” technology to organize the big data associated with the tree.
A graph database can better highlight how one species is connected to another. Facebook and Twitter use them. Earlier forms of data storage were very limiting. For instance, if you wanted to store information about your very interesting friend “Linda,” you were restricted to the data fields provided, such as her email, phone number, address, interests, and the like. If you had another friend, “Dave,” his information would be limited to the same data fields, and isolated from Linda’s.
However, as we’ve seen in social media, there are new and exciting ways to store all sorts of data types – AND to connect them with each other. So now, not only can you access Dave and Linda’s information, you can see the relationships of your other friends to them and they can see their shared relationships, too.
This type of approach to data management is perfect for the Open Tree of Life. With some editing and specializing of the types of data stored, the nodes and relationships that worked so well in social media will now be able to store information about the different species. Looking up a species of bear will not just result in information about that bear, but also similar bear species, its most recent ancestor to those other species, and even connections to the very first bear species.
These big data methods also make it much easier for future species to be added to the tree. With over ten million species left to discover and identify, the ability to expand the Open Tree is critical. Graph database technology will allow researchers and scientists to make those changes easily, without compromising the rest of the tree.. Neo4j is the graph database that forms the back end of the Open Tree of Life.
Keywords: Open Tree of Life