DeFi Knowledge Graph with Neo4j


Decentralized finance (DeFi) is a movement working to replicate and replace traditional finance (TradFi) using blockchain and cryptocurrencies.

Today you can lend, borrow, swap, margin trade, and even create your own mini hedge fund on chain. It’s all permissionless and trustless.

Photo by André François McKenzie on Unsplash

Most DeFi takes place on Ethereum. One notable exception is a protocol called Sovryn which brings financial primitives to bitcoin using the rootstock sidechain RSK.

The project I’ll cover today creates a knowledge graph of the Sovryn protocol using the graph database Neo4j.

My goals were to

  1. Learn how the Sovryn protocol really works. Blockchain and DeFi are advancing quickly, and the pace of progress is faster than the advances in tooling. After reading the documentation you have to read the source code of the smart contracts, or explore the raw block-level data. Both are difficult and unintuitive, particularly for developers that may be new to the space.
  2. Get a robust dataset about Sovryn. If you want to understand the activity on the protocol, and the developers don’t happen to provide the answer on the official app, you have to dig into the block data. This is clunky and much of the data is not human-readable.
    Its also difficult to get a summary of the data. This transaction describes a swap between bitcoin (technically WRBTC) and the stable coin Tether. What if you wanted to find all such transactions? Today you would have to download the whole chain and interact with the ABI.
    If you would rather spend your time doing data science instead of blockchain development, this is a serious barrier to entry.

Graphing the Chain

To create a knowledge graph of blockchain data we need to define only a few different types of nodes: Block, Transaction, Address, Token, Contract, and LogEvent. Token and Contract are subtypes of Address. Strictly speaking Token and Contract aren’t necessary but they certainly are convenient for helping humans make sense of what’s going on.

Each Block will CONTAIN zero or more Transactions. The Transactions are where much of the action is. Each Transaction is from one Address and may be to another one. If these addresses describe known Tokens or Contracts then the information for those will be filled in.

Each Transaction has one or more LogEvents. Each of these events CALLS various Addresses (or Tokens, or Contracts). In creating this knowledge graph, a number of ABIs were parsed so that the information in each of the CALLS.

The result is a simple schema that can capture all the richness of the blockchain. Very satisfying!

A simple schema can capture all the richness of the blockchain.

A Quick Tour of the Knowledge Graph

If you want to take the tour with me, check out the repository or the video tour.

One goal is to be able to load data from the protocol directly into Python.

To that end I put a wrapper around the Neo4j session to give it a little syntactic sugar. You can type any query directly into a knowledge_graph.Query object. First, lets see a few blocks with available data.

from sovrynkg.knowledge_graph import Query
q = Query()
q.add("MATCH (b:Block) RETURN b.height as height ORDER BY height LIMIT 10")
q.data()

[{'height': 2742418},
{'height': 2742441},
{'height': 2742445},
{'height': 2742446},
{'height': 2742448},
{'height': 2742450},
{'height': 2742451},
{'height': 2742453},
{'height': 2742457},
{'height': 2742460}]

So 2742418 is where it all began. Let’s see the transaction at that block. A Cypher query that gets us that data is:

MATCH (b:Block)-[:CONTAINS]->(tx:Transaction)-[:HAS_EVENT]->
(le:LogEvent)-[:CALLS]-(addy:Address)
WHERE b.height=2742418
RETURN b, tx, le, addy

Deciphering the Cypher: this says to find a block with the given block height, and also find the Transaction, LogEvent, and any Address that is connected (remember Token and Contract are also Address).

Sovryn is born.

Inspecting the CALLS relationship, it has

"name": "OwnershipTransferred",
"newOwner": "0x7be508451cd748ba55dcbe75c8067f9420909b49",
"previousOwner: "0x0000000000000000000000000000000000000000"

The first transaction on the Sovryn protocol is the creation of the contract. On RSK contracts are created by “transferring” ownership from the null address.

So What? I’m Here for the Money

Let’s chase the money. Find a reasonably high value transaction

q = Query()
q.add("
MATCH (tx:Transaction) RETURN tx ORDER BY tx.value DESC LIMIT 1")
result = q.only()
result

{'tx': {'gas_price': 60000000,
'gas_offered': 172201,
'gas_spent': 172201,
'gas_quote': 0,
'gas_quote_rate': 4083,
'tx_offset': 4,
'value_quote': 7350,
'tx_hash': '0xcaefac99f076cd6e9e02a2b1309056eebab634f7cdf0ff28b7050dbc37c9110d',
'value': 1800000000000000000,
'successful': True}}

This transaction involved 1.8 wrapped BTC ($55k USD) (BTC is given to the 18 decimal places).

Let’s get more details. We use the following (slightly verbose) query to pull out everything having to do with that single transaction.
It’s very similar to the above query, except this time we’re getting the Address that the bitcoin was sent TO and FROM , in addition to all the other information.

MATCH (b:Block)-[:CONTAINS]->(tx:Transaction) 
WHERE tx.tx_hash="0xbef02237efff3788082b28d74e34c7c245e1e8ea6a5b1da4d40967ddd08fd5a8"
MATCH (frm:Address)<-[:FROM]-(tx)-[:TO*0..1]-(to:Address)
MATCH (tx)-[:HAS_EVENT]->(le:LogEvent)-[:CALLS]-(addy:Address)
RETURN tx, le, addy, frm, to
High value transaction

Looks like this transaction was a loan. Whoever owns the from address 0x5d0eeaeabd5123e3d557c8a552134f24c6271a74 borrowed 1.8 WRBTC.

This address doesn’t seem to match any Contract or Token documented as part of the Sovryn protocol so its probably just some person out there on the chain.

Larger Scale Analysis

These colorful circles are all well and good, but what if you want to analyze meaningful amounts of data.

We can use the knowledge graph to do larger scale analysis as well. Let’s look at a swap — exchanging one type of token for an equal monetary value of another.

We’ll limit the number of results for this example, but you could just remove the limit and skip keyword arguments and get all the data.

import plotly.express as px
from sovrynkg.swaps import get_swap_df

df = get_swap_df(skip=1000, limit=1000)
df.head()
Results from a query about swaps, given as a dataframe

Great, we have the data. Now lets try to make sense of it. If we want to get more information about the addresses we can use a built-in tool.

import sovrynkg.contracts as contracts
wrbtc = contracts.BY_NAME['WRBTC']
wrbtc, wrbtc.address


(<Token WRBTC:0x542…677d>, '0x542fda317318ebf1d3deaf76e0b632741a7e677d')

You can slice and dice your dataframe in powerful ways. Let’s look at the history of the WRBTC/USDT swaps here.

bt_pair = df[df.to_token=='WRBTC']
bt_pair = bt_pair[bt_pair.from_token=='USDT']

#both WRBTC and USDT have 18 decimals
bt_pair['exchange_rate'] = bt_pair.from_amount/bt_pair.to_amount

fig = px.line(bt_pair, x='signed_at', y='exchange_rate', \
title='WRBTC vs USDT swap on Sovryn')
fig.show()
Exchange rate between Bitcoin and Tether (~$1 USD) on the Sovryn protocol over time

Knowledge Graphs to Answer Any Question

The amazing thing about a knowledge graph is that for just about any question you can dream up, the answer is embedded in the data somehow.

You just have to be clever enough to craft a query to find it. It’s this richness of exploration that makes knowledge graphing with Neo4j such a good tool for exploring blockchains.

  • How does the protocol work?
  • Who are the biggest users?
  • Are there any leading indicators of price movements between one cryptocurrency pair or another?
  • Given an outside dataset of the Sovryn team’s marketing efforts, is there an effect on trading volume on the protocol?

If you’ve ever had the experience of setting up an SQL database to answer one question, only to be immediately asked an entirely different question you’ll be able to sympathize.

To Ethereum, and Beyond

Because the RSK sidechain of bitcoin is compatible with the Ethereum virtual machine, we could unleash this same code onto Ethereum and map out that entire chain as well.

A continuously updated knowledge graph plus a convenient SDK would be a very convenient package.

If anyone out there is interested in seeing that I’d invite you to get in touch.

Again, the GitHub repository is here if you want to try it yourself.



Knowledge Graph DeFi with Neo4j was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.