Commonly we will want to get insight from any analytical processing on our operational data. For example, we may want to leverage the connectedness of customers to products and their networks to identify recommendation opportunities. However, doing analytical work on operational databases is seldom a good idea, and usually, there will be separate databases for each of the tasks. Also, we may want to stream insight as and when it becomes available.
This in itself can bring in new challenges: How do we keep the data on both database instances in sync? How do we stream results as and when they’re generated from our analysis onto our transactional database?
In this talk we will describe a scenario where graph databases in a cluster and read replica format are used for both operational means, and for delivering the analytical work, and how we can use this architectural pattern with Kafka to stream back analytical results to the operational databases as soon as they’re available, whilst ensuring all of the databases are up to date with the same data. This example uses the newly released Apache Kafka plugin for Neo4j.
Ljubica is part of Neo4j’s field team, based in London. She has a varied background, covering development, project management and architecture, in a diverse range of industries from ecology to finance. Ljubica a data geek with a particular interest in data lineage and associated areas.