Neo4j includes a growing, open library of high-performance graph algorithms that reveal the hidden patterns and structures in your connected data.
In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Last week, we explored how data connections drive future discoveries and how to streamline those data discoveries with graph analytics.
This week, we’ll take a detailed look at the many graph algorithms available in Neo4j and what they do.
15 Graph Algorithms Optimized for the Neo4j Graph Platform
Using Neo4j graph algorithms, you’ll have the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways that contagions or network failures spread, and the influences on and resiliency of groups.
And because Neo4j brings together analytics and transaction operations in a native graph platform, you’ll not only uncover the inner nature of real-world systems for new discoveries, but also develop and deploy graph-based solutions faster and have easy-to-use, streamlined workflows. That’s the power of an optimized approach.
Here is a list of the many algorithms that Neo4j uses in its graph analytics platform, along with an explanation of what they do.
Traversal & Pathfinding Algorithms
1. Parallel Breadth-First Search (BFS)What It Does: Traverses a tree data structure by fanning out to explore the nearest neighbors and then their sub-level neighbors. It’s used to locate connections and is a precursor to many other graph algorithms.
BFS is preferred when the tree is less balanced or the target is closer to the starting point. It can also be used to find the shortest path between nodes or avoid the recursive processes of depth-first search.
How It’s Used: Breadth-First Search can be used to locate neighbor nodes in peer-to-peer networks like BitTorrent, GPS systems to pinpoint nearby locations and social network services to find people within a specific distance.
2. Parallel Depth-First Search (DFS)What It Does: Traverses a tree data structure by exploring as far as possible down each branch before backtracking. It’s used on deeply hierarchical data and is a precursor to many other graph algorithms. Depth-First Search is preferred when the tree is more balanced or the target is closer to an endpoint.
How It’s Used: Depth-First Search is often used in gaming simulations where each choice or action leads to another, expanding into a tree-shaped graph of possibilities. It will traverse the choice tree until it discovers an optimal solution path (e.g., win).
3. Single-Source Shortest PathWhat It Does: Calculates a path between a node and all other nodes whose summed value (weight of relationships such as cost, distance, time or capacity) to all other nodes are minimal.
How It’s Used: Single-Source Shortest Path is often applied to automatically obtain directions between physical locations, such as driving directions via Google Maps. It’s also essential in logical routing such as telephone call routing (least-cost routing).
4. All-Pairs Shortest PathWhat It Does: Calculates a shortest path forest (group) containing all shortest paths between the nodes in the graph. Commonly used for understanding alternate routing when the shortest route is blocked or becomes sub-optimal.
How It’s Used: All-Pairs Shortest Path is used to evaluate alternate routes for situations such as a freeway backup or network capacity. It’s also key in logical routing to offer multiple paths; for example, call routing alternatives.
5. Minimum Weight Spanning Tree (MWST)What It Does: Calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time or capacity) associated with visiting all nodes in the tree. It’s also employed to approximate some NP-hard problems such as the traveling salesman problem and randomized or iterative rounding.
How It’s Used: Minimum Weight Spanning Tree is widely used for network designs: least-cost logical or physical routing such as laying cable, fastest garbage collection routes, capacity for water systems, efficient circuit designs and much more. It also has real-time applications with rolling optimizations such as processes in a chemical refinery or driving route corrections.
6. PageRankWhat It Does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.
How It’s Used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.
PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain-reaction of species death.
7. Degree CentralityWhat It Does: Measures the number of relationships a node (or an entire graph) has. It’s broken into indegree (flowing in) and outdegree (flowing out) where relationships are directed.
How It’s Used: Degree Centrality looks at immediate connectedness for uses such as evaluating the near-term risk of a person catching a virus or hearing information. In social studies, indegree of friendship can be used to estimate popularity and outdegree as gregariousness.
8. Closeness CentralityWhat It Does: Measures how central a node is to all its neighbors within its cluster. Nodes with the shortest paths to all other nodes are assumed to be able to reach the entire group the fastest.
How It’s Used: Closeness centrality is applicable in a number of resources, communication and behavioral analysis, especially when interaction speed is significant. It has been used to identifying the best location of new public services for maximum accessibility.
In social network analysis, it is used to find people with the ideal social network location for faster dissemination of information.
9. Betweenness CentralityWhat It Does: Measures the number of shortest paths (first found with Breadth-First Search) that pass through a node. Nodes that most frequently lie on shortest paths have higher betweenness centrality scores and are the bridges between different clusters. It is often associated with the control over the flow of resources and information.
How It’s Used: Betweenness Centrality applies to a wide range of problems in network science and is used to pinpoint bottlenecks or likely attack targets in communication and transportation networks. In genomics, it has been used to understand the control certain genes have in protein networks for improvements such as better drug- disease targeting.
Betweenness Centrality has also be used to evaluate information flows between multiplayer online gamers and expertise sharing communities of physicians.
Community Detection Algorithms
This category is also known as clustering algorithms or partitioning algorithms.
10. Label PropagationWhat It Does: Spreads labels based on neighborhood majorities as a means of inferring clusters. This extremely fast graph partitioning requires little prior information and is widely used in large-scale networks for community detection. It’s a key method for understanding the organization of a graph and is often a primary step in other analysis.
How It’s Used: Label Propagation has diverse applications from understanding consensus formation in social communities to identifying sets of proteins that are involved together in a process (functional modules) for biochemical networks. It’s also used in semi- and unsupervised machine learning as an initial preprocessing step.
11. Strongly ConnectedWhat It Does: Locates groups of nodes where each node is reachable from every other node in the same group following the direction of relationships. It’s often applied from a depth-first search.
How It’s Used: Strongly Connected is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups. In retail recommendations, it helps identify groups with strong affinities that then are used for suggesting commonly preferred items to those within that group who have not yet purchased the item.
12. Union-Find / Connected Components / Weakly ConnectedWhat It Does: Finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships. It provides near constant-time operations (independent of input size) to add new groups, merge existing groups and determine whether two nodes are in the same group
How It’s Used: Union-Find / Connected Components is often used in conjunction with other algorithms, especially for high-performance grouping. As a preprocessing step for undirected graphs, it helps quickly identify disconnected groups.
13. Louvain ModularityWhat It Does: Measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. It’s often used to evaluate the organization of complex networks and community hierarchies in particular. It’s also useful for initial data preprocessing in unsupervised machine learning.
How It’s Used: Louvain is used to evaluate social structures in Twitter, LinkedIn and YouTube. It’s used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring that would be indicated by a higher relationship density than average. Louvain revealed a six-level customer hierarchy in a Belgian telecom network.
14. Local Clustering Coefficient / Node Clustering CoefficientWhat It Does: For a particular node, it quantifies how close its neighbors are to being a clique (every node is directly connected to every other node). For example, if all your friends knew each other directly, your local clustering coefficient would be 1. Small values for a cluster would indicate that although a grouping exists, the nodes are not tightly connected.
How It’s Used: Local Cluster Coefficient is important for estimating resilience by understanding the likelihood of group coherence or fragmentation. Analysis of a European power grid using this method found that clusters with sparsely connected nodes were more resilient against widespread failures.
15. Triangle-Count and Average Clustering CoefficientWhat It Does: Measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique, and 0 when there are no connections. For the clustering coefficient to be meaningful, it should be significantly higher than a version of the network where all of the relationships have been shuffled randomly.
How It’s Used: The Average Clustering Coefficient is often used to estimate whether a network might exhibit “small-world” behaviors which are based on tightly knit clusters. It’s also a factor for cluster stability and resiliency. Epidemiologists have used the average clustering coefficient to help predict various infection rates for different communities.
The world is driven by connections. Neo4j graph analytics reveals the meaning of those connections using practical, optimized graph algorithms including the ones detailed above.
This concludes our series on graph algorithms in Neo4j. We hope these algorithms help you make sense of your connected data in more meaningful and effective ways.
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.
Get the O’Reilly Ebook
Catch up with the rest of the graph algorithms in Neo4j blog series:
About the Author
Amy E. Hodler , Analytics & AI Program Manager
In her career, Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager.
Amy has a love for science and art with an extreme fascination for complexity science and graph theory. When the weather is good, you’re likely to find her cycling the passes in beautiful Eastern Washington.