Graph Databases for Beginners: Graph Search Algorithm Basics
data:image/s3,"s3://crabby-images/a6971/a6971aa5f014f5255bea4530560271690be71426" alt="Joy Chao on the Neo4j team"
Community Graphista
8 min read
data:image/s3,"s3://crabby-images/058e4/058e46dd2992e72d42d5628acf95d8c7d08139c4" alt="Learn about popular graph search algorithms like Dijkstra's algorithm and the A-star algorithm"
Depth- and Breadth-First Search Algorithms
There are two basic types of graph search algorithms: depth-first and breadth-first. The former type of algorithm travels from a starting node to some end node before repeating the search down a different path from the same start node until the query is answered. Generally, depth-first search is a good choice when trying to discover discrete pieces of information. They’re also a good strategic choice for general graph traversals. The most classic or basic level of depth-first is an uninformed search, where the algorithm searches a path until it reaches the end of the graph, then backtracks to the start node and tries a different path. On the contrary, dealing with semantically rich graph databases allows for informed searches, which conduct an early termination of a search if nodes with no compatible outgoing relationships are found. As a result, informed searches also have lower execution times. (For the record, Cypher queries and Java graph traversals generally perform informed searches.) Breadth-first search algorithms conduct searches by exploring the graph one layer at a time. They begin with nodes one level deep away from the start node, followed by nodes at depth two, then depth three, and so on until the entire graph has been traversed. Like so: As you see above, a breadth-first search starts at a given node (marked0
) and then travels to every node that’s only one hop away (marked 1
) before it travels to each node that’s two hops away (2
). Then, only after it’s traveled to all of the 2
nodes would it explore one more hop to the nodes marked 3
.
Dijkstra’s Algorithm
Like his super-nerd predecessors – al-Khwarizmi and Euler – Edsger Wybe Dijkstra was super into algorithms and super into graphs. That’s probably why he invented Dijkstra’s algorithm in twenty minutes. Like I said, super-nerd. The goal of Dijkstra’s algorithm is to conduct a breadth-first search with a higher level of analysis in order to find the shortest path between two nodes in a graph. Here’s how it works:- Pick the start and end nodes and add the start node to the set of solved nodes with a value of 0. Solved nodes are the set of nodes with the shortest known path from the start node. The start node has a value of 0 because it is 0 path-length away from itself.
- Traverse breadth-first from the start node to its nearest neighbors and record the path length against each neighboring node.
- Pick the shortest path to one of these neighbors and mark that node as solved. In case of a tie, Dijkstra’s algorithm picks at random (Eeny, meeny, miny, moe, presumably).
- Visit the nearest neighbors to the set of solved nodes and record the path lengths from the start node against these new neighbors. Don’t visit any neighboring nodes that have already been solved, as we already know the shortest paths to them.
- Repeat steps three and four until the destination node has been marked solved.
-
- via Adelaide at a cost of 50 hours, or
- via Darwin at a cost of 82 hours
The A* Algorithm
The A* algorithm improves upon Dijkstra’s algorithm by combining some of its elements with elements of a best-first search. Pronounced “A-star”, the A* is based on the observation that some searches are informed, which helps us make better choices over which paths to take through the graph. Like Dijkstra’s algorithm, A* can search a large area of a graph, but like a best-first search, A* uses a heuristic to guide its search. Additionally, while Dijkstra’s algorithm prefers to search nodes close to the current starting point, a best-first search prefers nodes closer to the destination. A* balances the two approaches to ensure that at each level, it chooses the node with the lowest overall cost of the traversal. As demonstrated by the example in the previous section, it is possible for Dijkstra’s algorithm to overlook a better route while trying to complete its search. For more information on how the A* algorithm is used in practice, consult Chapter 7 of Graph Databases from O’Reilly Media.Conclusion
Graph search algorithms help you (or your friendly computer sidekick) traverse a graph dataset in the most efficient means possible. But “most efficient” depends on the results you’re looking for – a breadth-first search isn’t the most efficient if your results are better suited to depth-first queries (and vice versa). This is where your super-nerd friends (or even a helpful ebook, if you don’t have friends) come in handy. If they’re smart, they’ll match the right graph algorithm with the type of results you’re looking for. But now when they start talking breadth-first vs. depth-first vs. best-first you can tell them you know your A* from your Dijkstra and this isn’t your first rodeo. Go make al-Khwarizmi proud.-
- Why Graph Technology Is the Future
- Why Connected Data Matters
- The Basics of Data Modeling
- Data Modeling Pitfalls to Avoid
- Why a Database Query Language Matters (More Than You Think)
- Imperative vs. Declarative Query Languages: What’s the Difference?
- Graph Theory & Predictive Modeling
- Why We Need NoSQL Databases
- ACID vs. BASE Explained
- A Tour of Aggregate Stores
- Other Graph Data Technologies
- Native vs. Non-Native Graph Technology