Markov Chains
A while back I wrote a blog post explaining Markov chains and demonstrating different ways of finding their steady-state distribution in R. Now, I want to play with Markov chains as a graph. I’m going to pull examples from around the internet and answer the same questions in Cypher as the authors do with matrices. This gives me the opportunity to explore more advanced Cypher queries while working with a topic I enjoy very much (stochastic processes and Markov chains). So this is officially just for funsies.
I found three Markov chains online that I’m going to showcase, and they involve the following topics:
If you are unfamiliar with Markov chains, you might want to refer to my blog post linked above for a quick overview.
Finance
Description
The Wikipedia article on Markov chains has an example consisting of states and probabilities for a hypothetical stock market. The market can be in one of three states:
-
Bull
-
Bear
-
Stagnant
Graph Representation
Wikipedia conveniently provides a graphical representation of the states and probabilities:

Question
The question posed in the article is: If we are currently in a Bear market, what is the probability distribution three time-steps from now? In other words, what is the probability we will be in either a Bull, Bear, or Stagnant market three days from now given we are in a Bear market today?
Solve with Matrices
Wikipedia solves this by taking the one-time-step transition probability matrix, \(P\), to the third power and extracting the relevant row:
\(Answer = [0.3575, 0.56825, 0.07425]\)
Solve with Cypher
In Cypher, we can achieve the same answer with the following:
MATCH (w)-[r]->(x)
WHERE w.name= "Bear"
WITH x, r.p as p1
MATCH (x)-[r]->(y)
WITH y, p1, r.p as p2
MATCH (y)-[r]->(z)
WITH z, p1 * p2 * r.p AS product
RETURN z.name AS `State at time t = 3`, SUM(product) AS Probability
This tells us that three days from now, given we are currently in a Bear market, we will be in a Bull market with probability 0.3575, in a Bear market with probability 0.56825, or in a Stagnant market with probability 0.07425.
MATCH n-[r]-m
WITH n, r
DELETE n, r
Manufacturing
Another example I found during my Google search expedition is the following from slide 4 of this Beamer presentation. The language in the presentation is not very clear; I think the author switched the example from a car to a machine but did not change all instances of the word 'car' in the presentation. Anyway, here is a description:
Description
Machines at a factory are either working or broken. If a machine is currently working, it will break down by tomorrow with probability \(b\). If a machine is currently broken, it will be repaired by the maintenance crew by tomorrow with probability \(r\). This factory has a rule that if a machine is broken for at least three days, then the machine is put on a list of urgent repairs and will be repaired by tomorrow with probability \(1\).
Because the machine can be broken for up to but not exceeding three days, this Markov chain consists of four states:
-
Working
-
Broken for 1 day
-
Broken for 2 days
-
Broken for 3 days
Question
From the presentation (slide 7):
The factory buys 100 machines. 25 of the machines are bought used; of these 25, 20 have been broken for one day and 5 have been broken for two days. The remaining machines are new and are in working order. Four days from now, how many machines do we expect to have on our urgent repair list? In other words, how many machines will be in their third day of disrepair on day four?
The presentation considers the following values for \(r\) and \(b\):
\(r = 0.8\)
\(b = 0.1\)
Solve with Matrices
To solve using matrices / vectors, we first need the initial distribution vector. We know 20 machines have been broken for one day, 5 machines have been broken for two days, 0 have been broken for three days, and thus 100 - 25 = 75 machines are in working order:
Next, we need to find \(P^4\) so we can have the four-time-step transition probability matrix:
The answer can now be found by multiplying the initial distribution vector \(\mu_0\) by the relevant column (the column corresponding to Broken3) of the four-time-step transition probability matrix \(P^4\):
\(= 0.0036(0.75) + 0.0032(0.20) + 0.0032(0.05) + 0.0040(0)\)
\(= 0.0035\)
Solve with Cypher
To solve this in Cypher, we get to use a CASE statement! How fun:
MATCH (v)-[r]->(w)
WITH v, w, r.p AS p1
MATCH (w)-[r]->(x)
WITH v, x, p1, r.p AS p2
MATCH (x)-[r]->(y)
WITH v, y, p1, p2, r.p AS p3
MATCH (y)-[r]->(z)
WITH v, z, p1 * p2 * p3 * r.p AS product
WHERE z.name = "Broken3"
WITH v,
product * (
CASE
WHEN v.name="Working" THEN 0.75
WHEN v.name="Broken1" THEN 0.20
WHEN v.name="Broken2" THEN 0.05
ELSE 0
END) AS dist
RETURN SUM(dist) AS Probability
This tells us that we expect 0.0035 or 0.35% of machines to be in their third day of disrepair on day four. 0.35% of 100 is 0.35, so we expect 0.35 machines to be in their third day of disrepair on day four.
Moreover, we can find the distribution of all 100 machines (the presentation does not ask or go over this) with some tweaks in the query:
MATCH (v)-[r]->(w)
WITH v, w, r.p AS p1
MATCH (w)-[r]->(x)
WITH v, x, p1, r.p AS p2
MATCH (x)-[r]->(y)
WITH v, y, p1, p2, r.p AS p3
MATCH (y)-[r]->(z)
WITH v, z, p1 * p2 * p3 * r.p AS product
WITH v, z,
product * (
CASE
WHEN v.name="Working" THEN 0.75
WHEN v.name="Broken1" THEN 0.20
WHEN v.name="Broken2" THEN 0.05
ELSE 0
END) AS dist
WITH z, SUM(dist) AS prob
ORDER BY prob DESC
RETURN z.name AS State, prob*100 AS `Expected Number of Machines in State on Day 4`
From this, we find that we expect there to be, on day four, roughly 88.97 machines in working order, 8.91 machines in their first day of disrepair, 1.78 machines in their second day of disrepair, and 0.35 machines (which we just found earlier) in their third day of disrepair.
MATCH n-[r]-m
WITH n, r
DELETE n, r
Actuarial Science
In actuarial science, insurance holders are often modeled in Markov chains where the state is typically some indication of their profitability and the probabilities of changing states is typically a function of the insurance holder’s attributes, such as their weight and whether or not they smoke (if we’re dealing with health insurance), or maybe attributes of their car, such as its maximum speed and color (if we’re dearling with automobile insurance).
Using #20 from this exam as inspiration, we have the following:
Description
Insurance holders can belong to one of three classes for automobile insurance:
-
Preferred
-
Standard
-
Substandard
At the end of each year, insurance holders can transition from one class to another according to the following one-time-step transition probability matrix:
Question 1
If an insurance holder is currently in the Standard class, what is the probablity he will be in the Standard class in two years?
Solve with Matrices
As we know, we need to find \(P^2\) in order to answer this question. The answer will be the (Standard, Standard) entry of this matrix:
\(Answer = 0.52\)
There is a 52% chance the customer currently in the Standard class will be in the Standard class in two years.
Question 2
A question not asked by the exam but that is interesting: What is the probability an insurance holder currently in the Standard class will not be in the Substandard class in any of the next five years?
Solve with Matrices
In order to solve using matrices, we need to arbitrarily change the one-time-step transition probability matrix so that the Substandard class is an 'absorbing state'. With this, if the Markov chain ever enters the Substandard state, it can’t leave that state. We do this because we eventually find the 'complement' of the probabilities resulting from this arbitrary transition probability matrix in order to answer questions involving scenarios where the process can never enter that state. So, our new one-time-step transition probability matrix is:
Notice the third row (the row corresponding to the Substandard state): \([0, 0, 1]\). This means that there is a 0 probability of transitioning from the Substandard class to the Preferred class, a 0 probability of transitioning from the Substandard class to the Standard class, and a 100% probability of staying in the Substandard class. With this, we then find the five-time-step transitiion probability matrix of our new matrix \(A\):
Like I mentioned earlier, we are interested in the 'complement' of the probabilities shown here when answering questions involving the constraint that the insurance holder 'never' enters the Substandard class. Thus, our answer is the complement of the (Standard, Substandard) entry of the matrix shown above, since we want to know the probability that an insurance holder currently in the Standard class is never in the Substandard class during the next five years:
\(Answer = 1 - 0.55572 = 0.44428\)
Solve with Cypher
In Cypher, we do not need to make any arbitrary changes to the transition probabilities. We just tell the query to avoid the Substandard class:
MATCH (u)-[r]->(v)
WHERE u.name = "Standard" AND v.name <> "Substandard"
WITH v, r.p AS p1
MATCH (v)-[r]->(w)
WHERE w.name <> "Substandard"
WITH w, p1, r.p AS p2
MATCH (w)-[r]->(x)
WHERE x.name <> "Substandard"
WITH x, p1, p2, r.p AS p3
MATCH (x)-[r]->(y)
WHERE y.name <> "Substandard"
WITH y, p1, p2, p3, r.p AS p4
MATCH (y)-[r]->(z)
WHERE z.name <> "Substandard"
WITH z, p1 * p2 * p3 * p4 * r.p AS prob
RETURN SUM(prob) AS Probability
Is this page helpful?