Explore the Neo4j Entity Resolution Sandbox


Entity resolution (ER) is the process of analyzing and disambiguating data to identify multiple digital records representing the same real-world entity such as a person, organization, place, or other types of object.

For instance, a user can have more than one user account or profile on an e-commerce website with different email addresses and slightly different forms/abbreviations of names, etc.

A human may be able to tell if the records actually belong to the same underlying entity. But given the number of possible combinations and possible matching, an intelligent automated solution is required for doing so, which is where entity resolution systems come into play.

Photo by Tolga Ulkan on Unsplash

Use Cases

Entity resolution has many use cases across many sectors.

Life Science and Healthcare Industries

Life science and healthcare organizations require data linking the most. For example, a healthcare organization can implement Entity resolution for linking a patient’s records from a number of sources, matching data from hospitals, clinics, labs, insurance claims, and social media profiles to create a unique profile of each patient. This will help curate a precise and effective treatment. Life Science organizations can use ER to connect various entities, research results, input data sets, etc. This can facilitate the R&D.

Insurance and Financial Services Organizations

Financial services and insurance companies often struggle with siloed datasets. Various products/categories maintain their data in different systems and databases. Hence, it is difficult to reconcile a customer’s choices, track record, credit ratings, etc. on a central platform. ER can enable them to perform record linking on different data sets and produce a unified view of customers’ states and needs.

Digital Marketing and Content Recommendation Businesses

Effective marketing and recommendation schemes cannot be produced using distinct data sets or fragmented data. Records linking, machine learning, and analytics can be very helpful in producing effective marketing content.

Identifying redundant customers is another area in marketing and CRM that needs to be addressed. ER can be mighty effective in such use cases.

Graphs Can Come in Handy

Graphs can add benefits to the entity resolution process, by not just using the attributes of the entities but also taking their context into account — such as behavior, social relationships, shared attributes to others, connections to people, objects, locations, events (POLE).

Example Use Case

We have crafted a similar use case for performing entity resolution.

We take the example of a dummy online movie streaming platform. For ease of understanding, we took only movies and users’ datasets. Movies have different genres. Users can stream and watch movies on this platform. One or more persons from a family can be using the same or different profiles on the platform.

Data Model
Example

Users can have one or more accounts on a movie streaming platform. They have slightly different names or abbreviations, and different email addresses configured with different profiles.

We are performing entity resolution over users’ data to identify such users. We are also performing linking for users who are from the same account (or group/family). Later, we utilized this linking to provide effective recommendations to individual users.

Readymade Neo4j Sandbox to Explore Entity Resolution

We have a preset sandbox to walk you through this Entity Resolution use case. Neo4j Sandbox is a great — and free — online tool from Neo4j to try their graph database without installing anything locally.

We have loaded an easy browser guide to walk you through the below steps.

  • Relate: Establish connections (relationships) between entities
  • Explore: Perform basic querying with Cypher on loaded data
  • ER: Perform Entity Resolution based on similarity and do record linkage
  • Recommend: Generate recommendations, based on user similarities or preferences
  • Additional: Try a couple of preference-based similarities and recommendation examples using Neo4j GDS

References

Entity resolution explanation and one more interesting study:

Exploring Supervised Entity Resolution in Neo4j

Full source code as an example to try on your local box:

GitHub – neo4j-graph-examples/entity-resolution: Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real entity across different digital entities present on same or different data sets. Record linking is necessary when joining different entities which are similar and may or may not share some common identifiers. Neo4j offers various advantages to perform entity resolution / record linking. This repository covers such a use case of linking similar user accounts for analytics and providing better recommendations.


Explore the Neo4j Entity Resolution Sandbox was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.