Transform Static Risk Assessment Into a Dynamic Data-Driven Strategy

Solutions Engineer, Neo4j

April 22, 2025

8 min read

Commercial Credit Risk Assessment Is Complex

Introduction

Loan origination is a complex process that requires managers to analyze financial statements, leverage ratios, and profitability metrics. Beyond financial data, they also assess the company’s industry and business model.

These insights, along with other key factors, feed into internal risk rating models, which generate a risk rating that influences loan pricing and approval. Additional factors affecting the risk rating include industry trends and macroeconomic conditions, such as market fluctuations, regulatory risks, and the broader economic environment.

Much of this information is stored as unstructured data, such as annual report filings, quarterly earnings reports, balance sheets, and income statements. This not only adds to the complexity of the process but also introduces challenges like bias in data interpretation, lack of transparency in assessments, and the risk of overlooking critical factors due to human error.

Large Language Models to the Rescue

Large language models (LLMs) bring significant benefits across industries by automating tasks, improving decision-making, and increasing efficiency. In credit risk rating, one immediate advantage is their ability to extract vast amounts of unstructured data and organize it into a database for faster, more efficient analysis.

But why stop there? Imagine how much deeper a manager’s analysis could be if the database also integrated news and market sentiment, legal documents, and even weather events — all with real-time insights at their fingertips. This would transform risk assessment from a static process into a dynamic, data-driven strategy. This is the brain upgrade our model needs.

Graph or Vector Database?

Graph databases and vector databases serve different purposes. The idea of one being better than the other depends on the specific use case. However, graph databases can be considered better than pure vector databases in certain contexts, especially when dealing with complex relationships between entities. Without going into much detail, here are a few reasons why a graph database would be better in this context:

Complex Relationships

Graph databases represent and query complex relationships and networks.
Vector databases store high-dimensional vectors representing objects. While optimized for similarity search (finding items similar to a given vector), they don’t inherently capture or manage relationships between entities in a structured way.

Efficient Relationship Queries

Graph databases are optimized for querying and navigating the relationships between entities.
Vector databases don’t have the same natural support for traversing relationships in the data. Relationships are typically inferred based on similarity rather than explicitly defined, as in a graph.

Data Flexibility

Graph databases support a variety of relationships and can evolve dynamically. You can add new types of relationships between existing entities without major restructuring.
Vector databases are usually more rigid in structure. While vectors can represent complex data, modifying the way vectors are stored or interpreted often requires retraining models or reformatting the data.

My colleagues Tomaz Bratanic and Zach Blumenfeld put together two great blog posts showing how to build knowledge graphs from unstructured data: Building Knowledge Graphs with LLM Graph Transformer and GraphRAG Python Package: Accelerating GenAI With Knowledge Graphs.

The Brain Meets the Graph

Now, let’s explore how connected data can unlock deeper intelligence. I’ll skip the details of how the graph was built, but in short, it combines both structured and unstructured data. The model integrates financial insights from 10-K annual report filings, commodities data, company profiles, and even weather events to provide a comprehensive view. (Thanks to my colleague Alex Fournier for helping put this together.)

As shown above, each document has been broken into chunks, with key entities extracted, such as commodities, construction materials, and elements like copper, nickel, and lithium. In addition to the raw text, each chunk includes a vector embedding that captures its semantic meaning. A full-text index has also been created based on the text properties of these chunks.

This model demonstrates its potential to answer questions that were once impossible — or extremely time-consuming — to tackle. And the best part? In many cases, we can bypass the need for LLMs altogether and simply query the graph to drive business decisions.

For example, leveraging our full-text index, we can search for terms such as “dependency on lithium” and traverse our graph to provide more context to the result:

CALL db.index.fulltext.queryNodes("ftChunkContent", "dependency on lithium") 
YIELD node, score
WITH node, score
WITH elementId(node) as chunkId, score
CALL(chunkId)
{
    MATCH (c:Chunk)<-[:HAS_CHUNK]-(d:Document)<-[:FILED]-(co:Company)
    WHERE elementId(c) = chunkId
    return c.text as Text, co.name as Company
}
return Text, Company, score
order by score desc
limit 10

A visualization of the results is shown below, with Tesla, Exxon, and Ford having the highest score proximity to the search term.

And here’s the text matching our search term in Tesla’s 10-K filing:

“In addition, the battery packs that we produce make use of lithium-ion cells. On rare occasions, lithium-ion cells can rapidly release the energy they contain by venting smoke and flames in a manner that can ignite nearby materials as well as other lithium-ion cells. While we have designed our battery packs to passively contain any single cell’s release of energy without spreading to neighboring cells, there can be no assurance that a field or testing failure of our …”

With this model in place, we can explore the benefits of an LLM in the hands of our risk analyst.

Retrieval-Augmented Generation

By leveraging the vector embeddings in our chunk nodes, we can treat our model like a vector database and perform similarity searches. The Neo4j GraphRAG Python package abstracts the complexity of creating retrievers. For example:

vector_retriever = VectorRetriever(
   neo4j_driver,
   index_name="chunk-embedding-index-openai",
   embedder=openai_embedder,
   return_properties=["text"],
)

Now we can ask:

Question: What companies are exposed to risk of lithium shortages?

Response:
“Ford is highly dependent on its suppliers to deliver components in accordance with Ford’s production schedule and specifications, and a shortage of or inability to acquire key components or raw materials, such as lithium, cobalt, nickel, graphite, and manganese, can disrupt Ford’s production of vehicles.”

“… availability of raw materials such as lithium, nickel, cobalt and/or other metals. The prices for these materials fluctuate and their available supply may be unstable, depending on market conditions and global demand for these materials. For example, as a result of increased global production of electric vehicles and energy storage products, suppliers of these raw materials may be unable to meet our [Tesla] volume needs. Additionally, our suppliers may not be willing or able to reliably meet our timelines or our cost and quality needs, which may require us to replace them with other sources.”

Since we also have weather events, we can try incorporating that into our risk analysis. For example:

Question: What companies might have weather affecting them and what are the materials at risk?

Response:
“The occurrence of natural disasters or extreme weather events, such as hurricanes, tropical storms, floods, wildfires, earthquakes, tsunamis, cyclones, typhoons, extended winter storms, droughts and tornadoes; epidemics, pandemics or disease outbreaks and other extreme events and man-made disasters, such as nuclear or biological attacks or other acts of violence, such as active shooter situations, whether as a result of war or terrorism or otherwise, can have a material adverse effect on the U.S. economy in general, our industries and us specifically.”

While similarity search provides decent results, it falls short when it comes to reasoning beyond individual chunk nodes to deliver more accurate answers. Let’s see how these same questions perform when using a GraphRAG retriever instead.

GraphRAG

This retriever combines vector search and Cypher traversals to incorporate additional nodes and relationships, which result in more accurate answers. This time, we add a parameter to our retriever code (a traversal query that guides the LLM through our graph model):

graph_retriever = VectorCypherRetriever(
   neo4j_driver,
   index_name="chunk-embedding-index-openai",
   embedder=openai_embedder,
   result_formatter=vector_cypher_formatter, 
   retrieval_query = traversal_query
)

traversal_query = """
WITH node as chunk
MATCH (co:Company)-[:FILED]->(:Document)-[:HAS_CHUNK]->(chunk)-[:MENTIONS]->(n)
      
WITH COLLECT(Distinct chunk) as chunks,
COLLECT(Distinct co) as companies,
COLLECT(Distinct n) as elements

 RETURN                 
      'nn=== text ===n' + 
       apoc.text.join([c in chunks | c.text], 'n---n') + 
       'nn=== graph ===n' + 
       'nn=== companies ===n' +
        apoc.text.join([c in companies | c.name], 'n---n') +
       'nn=== elements ===n' +
        apoc.text.join([e in elements |labels(e)[0] + ': ' + e.name], 'n---n')  
        AS info
"""

Question: What companies are exposed to risk of lithium shortages?

Response:
1. Ford: Ford is highly dependent on the supply of lithium and other raw materials for the production of electric vehicle batteries. A shortage of lithium can disrupt Ford’s production of vehicles and increase production costs.

2. Tesla: Tesla relies on lithium-ion battery cells for its vehicles and energy storage products. A reduced availability of lithium may impact Tesla’s access to battery cells and its growth, potentially reducing profitability.

3. Panasonic: As a supplier of lithium-ion battery cells, Panasonic is exposed to the risk of lithium shortages, which could affect its ability to meet the demand from companies like Tesla and other clients.

Question: What companies might have weather affecting them and what are the materials at risk?

Response:
1. Amazon
— Weather events: Flood Warning
— Materials at risk: Glass
— Summary: Amazon’s operations could be impacted by flood warnings, which may affect their logistics and supply chain, particularly concerning the handling and transportation of glass materials.

2. CVS Health
— Weather events: None specified, but extreme events are mentioned.
— Materials at risk: Carbon (in the context of transition risks from carbon-based energy)
— Summary: While no specific weather events are mentioned, CVS Health is concerned about extreme events that could impact operations and increase healthcare costs. The transition from carbon-based energy also poses risks.

3. AT&T
— Weather events: None specified, but natural disasters and extreme weather conditions are mentioned.
— Materials at risk: Not specified
— Summary: AT&T’s infrastructure and operations could be disrupted by natural disasters and extreme weather conditions, which could lead to significant service disruptions and financial impacts.

These companies face potential risks from weather-related events that could disrupt their operations and supply chains, affecting their materials and overall business performance.

Summary

At first glance, GraphRAG might seem like just another tool in our GenAI toolkit. But dig deeper, and you start to see something more fundamental: a new way of assessing risk — not just by collecting data but by understanding its structure.

For our risk analyst, the challenge has never been the lack of information. It’s the fragmentation, the endless documents that bury insights under layers of irrelevant detail. GraphRAG changes that equation. It doesn’t just retrieve data; it connects the dots. It identifies patterns across seemingly unrelated disclosures, commodities reports, and even environmental events — surfacing what a traditional model might miss entirely.

And that’s the real story here — not that we’ve found a better retriever but that we’ve built a smarter brain, one that thinks a little more like an analyst and a little less like a machine.

Transform Static Risk Assessment Into a Dynamic Data-Driven Strategy was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.