We’re thrilled to collaborate with Microsoft on a unified data offering to uncover hidden data patterns and address critical GenAI challenges, such as the ability to provide personalization and deliver relevant search outcomes through additional context.
Neo4j’s graph capabilities—knowledge graphs, data science algorithms, graph-powered RAG, and vector and semantic search— will now be natively integrated into the Microsoft Fabric analytics platform and Microsoft Azure OpenAI Service.
Customers using Neo4j, Microsoft Fabric, and Azure OpenAI together can seamlessly combine structured and unstructured data, easily discover hidden patterns across billions of data connections, enhance contextual understanding within their data, and rapidly deliver enterprise-grade GenAI applications.
The new offering allows organizations to:
- Transform unstructured data into knowledge graphs. Azure OpenAI can process unstructured data and load it into a knowledge graph, allowing Neo4j query tools to extract powerful insights.
- Enhance contextual understanding and explainability with GraphRAG. GraphRAG applications, fully integrated with Neo4j’s GenAI functions and Azure OpenAI, can use knowledge graphs derived from enterprise data to enhance query prompts dynamically.
- Provide long-term memory for LLMs with vector embedding integration. Neo4j supports native vector embeddings, and developers can use OpenAI embedding APIs to create embeddings and store them in the Neo4j database.
- Generate graph-powered insights as part of Fabric. Fabric customers can now use Neo4j graph analytics capabilities to quickly find hidden patterns and relationships within their data.
- Deploy Neo4j Graph Analytics as a native Fabric workload. Neo4j Graph Analytics will soon be a native Microsoft Fabric workload, allowing users to create graph models from OneLake data, run Graph Data Science algorithms, and write results back into OneLake.
Here’s a more detailed look at the integrations, starting with Neo4j + Azure OpenAI and moving on to Neo4j + Microsoft Fabric.
Neo4j and Azure OpenAI: Graph Insights and Enterprise-Ready GenAI
Since 2009, Neo4j has pioneered graph databases, helping organizations find meaning in complex, interconnected datasets. Graph technology is set to transform enterprise analytics and AI, with Gartner predicting that it will drive 80% of data and analytics innovations by 2025—up from 10% in 2021. Gartner also sees knowledge graphs as “a vital capability” and “the first step to resolving fragmented data management issues by enabling a GenAI-augmented data fabric.”
Neo4j has been working with Azure OpenAI Service since its private preview. Combining the tools allows us to build graphs from data we might not be able to ingest and model otherwise, and to make those graphs accessible to a much broader group of people.
Any industry with underused, highly connected data can benefit from this approach. Neo4j and Azure OpenAI can support use cases across the financial service industry, for example, fraud and money laundering detection. Supply chain and manufacturing use cases include knowledge transfer, bill of materials management, and applications for optimization.
From our first project with OpenAI onward, we’ve observed a common architecture for applying GenAI with graphs:
Neo4j’s Knowledge Graph and Generative AI reference architecture
This architecture consists of:
- Ingestion – Extracting a knowledge graph from structured, semi-structured, and even unstructured data using the Azure OpenAI Service, then feeding it into the Neo4j Graph Database. Source data might reside in Fabric, Azure Blob Storage, or elsewhere. Automating ingestion with generative AI reduces the cost of getting started with graph databases, making it possible to gain value from connections in data where previously impossible.
- Consumption – Before generative AI, it required deep expertise to interact with graphs and get value from the connections in data. Layering Azure OpenAI Service over the Neo4j enables any user to interact with a graph.
Consider a specific example: a medical case sheet data collection. We’re going to parse that data to build a knowledge graph and then layer a chat interface on top with options to run in a Streamlit application:
This example shows transforming unstructured data to a knowledge graph for consumption
Ingestion and Knowledge Graph Extraction
In this example, we use zero-shot with a simple prompt and the gpt-4-32k model (You can find more information on OpenAI models here). That allows us to extract case sheet information for each person into a Neo4j knowledge graph (Check out the notebook on the GitHub repository for more details). Here’s the resulting data model:
Showing extraction of a case sheet information for each person by a Neo4j data model
Let’s consider what we’ve just accomplished. We’ve used generative AI to build and populate a knowledge graph with unstructured medical case data. This project might take weeks to do manually; with Neo4j and the OpenAI model, we’ve done it in minutes. We can apply graphs to entirely new problems. The vast stores of untapped enterprise data—consider information regarding drug interactions, shipping routes, or data breaches—hold the potential to construct new graphs. This allows users to derive value from connections within their data that were previously undiscovered.
Consumption
What’s next now that we have a graph? Neo4j offers many tools to interact with the graph, from the Cypher query language to the Bloom graph visualization tool. Generative AI lets us do something new: interact with the graph by asking natural language questions. One of the simplest ways to do this is with the LangChain integration between both Neo4j and the Azure OpenAI Service:
Showing the flow to convert the natural language prompt to Neo4j Cypher query language
Stepping through this, we see that a user enters a prompt to ask, “How many of my patients suffer from both coughing and weight loss?” That is passed to the chatbot, a Python application that uses LangChain. LangChain passes the prompt to the Azure OpenAI Service, converting the question into a Cypher statement that will address the question. LangChain then passes that Cypher statement to the Neo4j database, and a response is received.
Finally, LangChain invokes the Azure OpenAI Service a final time. The service summarizes the JSON blob resulting from the database query into natural language, which is presented to the user in the Streamlit UI.
So, what have we accomplished? We’ve enabled folks to ask natural language questions to a Neo4j database. That means non-technical users can now explore graphs and get value from the connections in their data—like understanding how many patients suffer from coughing and weight loss. Our customers use this technique to enable internal users to answer questions about their business practices based on a knowledge graph representing internal technical documents.
Neo4j and Microsoft Fabric: A New World of Data Exploration, Analysis, and Collaboration
Microsoft Fabric is a unified platform for data analytics, combining a range of data toolsets under one umbrella—and integrating with Neo4j unlocks its full potential, revolutionizing the data exploration and analysis enabled by Fabric.
In Fabric LakeHouses, data is typically stored as files or rows and columns in SQL Server tables, which may not adequately capture intricate relationships within datasets, impeding thorough analysis and insights.
Neo4j’s Graph Database represents data as nodes and relationships, enabling intuitive visualization and comprehensive exploration of connections within datasets. With built-in machine learning and AI algorithms, Neo4j helps organizations uncover hidden patterns and derive deeper insights from their data, ultimately driving better business outcomes. For example:
- Reduced operational burden on IT teams. IT teams running Neo4j AuraDB within the Fabric ecosystem can uncover hidden data patterns and actionable insights within their data far more quickly and easily.
- More informed business decision-making. By integrating the Neo4j BI Connector with Fabric Power BI, organizations can enable real-time data access across many data streams. Business decision-makers gain a deeper, more comprehensive understanding of data.
- Better customer experience and risk management. Running Neo4j Graph Data Science algorithms with Fabric Data Science allows deeper analysis of technical patterns to improve predictions and create node embeddings, offering insights into individual node relationships—invaluable for customer experience, risk management, and more.
Current and upcoming Neo4j and Microsoft Fabric integrations include:
- Synapse Data Engineering module integration. By leveraging Python-based notebooks within Microsoft Fabric’s Synapse Data Engineering module, users can tap into Neo4j’s graph data seamlessly. The integration allows data scientists to import Neo4j libraries directly, enabling tasks such as reading, writing, and employing graph data science algorithms effortlessly.
- Neo4j Browser integration. Neo4j Browser is a developer-friendly interface for executing Cypher queries and visualizing results, facilitating ad-hoc graph queries and prototype development from the browser interface. With support for loading various file formats, including JSON from OneLake, users can easily import and manipulate data, enriching their analysis with Neo4j’s graph insights.
- Data Factory and Neo4j’s JDBC/ODBC drivers. By using these drivers within Data Factory, organizations can seamlessly transfer data between Neo4j and Fabric, enhancing data pipelines and facilitating efficient data processing workflows.
- Neo4j BI Connector for PowerBI. Neo4j provides a Business Intelligence (BI) connector tailored for seamless integration with PowerBI. With this connector, data from Neo4j can be queried using SQL dialect, offering enhanced performance and flexibility. Taking advantage of Neo4j’s graph-native storage format and fast graph traversals, data retrieval demonstrates significantly higher performance compared to traditional relational databases.
- Microsoft Fabric workload Integration (upcoming). Neo4j will soon be integrated as a native workload for Graph Analytics on the Microsoft Fabric analytics platform. This will enable users to access graph analytics workloads directly from the Microsoft Fabric console, create graph models from OneLake data, analyze graph data, run Graph Data Science Algorithms using Neo4j Bloom as a pluggable UI component, and write back results into OneLake for a seamless end-to-end integration. This integration improves the user experience by blending the capabilities of both platforms.
Showing Neo4j Bloom UI plugged in within Microsoft Fabric console
Redefining What’s Possible in a World Shaped by AI
Delivering enterprise-grade analytics and AI isn’t easy. It requires continual access to clean data from an integrated data analytics and generative AI platform—and that platform needs database technology that can rapidly identify connections within complex datasets while ensuring that GenAI responses meet enterprise standards.
Neo4j enables Microsoft Fabric users to realize the full potential of GenAI and modern analytics—to overcome hallucinations and other GenAI challenges by grounding LLMs with domain-specific data and to deepen business insights by storing information in a graph structure, where intricate relationships within datasets are easy to model and query.
We’re excited about this new strategic partnership with Microsoft because it gives organizations a powerful integrated solution for staying ahead of the GenAI and data analytics curve, both now and for years into the future.
To get started with Neo4j within the Microsoft Fabric ecosystem, explore the GitHub repository for integration resources and get started with Neo4j in the Azure Marketplace today. Unlock the power of graph data to revolutionize your data analytics and drive innovation within your organization.