By Michael Hunger and Philip Rathle It used to be that databases were just tasked with digitizing forms and automating business processes. The data was often tabular – take an accounting ledger, for example –and the processes being modeled were reasonably static. Today, the types of data that we are interested in are much more diverse and dynamic. We are interested in capturing information about all sorts of things that are happening around us, which requires us to deal with dynamic systems that often generate large quantities of data that are semi-structured and volatile, where the connections between the discrete data points are as important as the sum of its distinct parts.
Broadly speaking, the trends at play are as follows:
- Volume and Velocity. The volume of data created and handled is growing exponentially. Eric Schmidt famously said in 2010 that every day we create as much data as was created in total from beginning of written history through 2003. And it continues to accelerate as the millions of systems running our world capture more and more data from more and more sources.
- Variety. The shape of data is becoming more complex, less structured and less predictable as well as more heterogeneous. This is reflective of an increasing variety of data sources, together with the need to model real-world systems such as social networks, biological systems and the World Wide Web.
- Connectedness. Many of the real-world systems that businesses are seeking to model are highly interconnected. These rich, connected data sets bring an implicit need to understand and navigate relationships, which exert an effect upon both the individual data points and the overall system. James Fowler, author of Connected, is among the researchers who are discovering that one can often understand a person better by learning about those around him than by learning about him as an individual; and that it is not just the first-order relationships that impact behavior, but the second and third as well.