New AWS Software Competencies — Financial, Auto, GenAI, and ML | Learn Now
Dev Conference by Neo4j
You only need to register once to attend all sessions.
Session Track: Data Science
Session Time:
Session description
Even with a conservative estimate, 80% of enterprise data is stored in unstructured files within data lakes that contain various formats. Traditional search engines can no longer meet the needs for information seeking, especially when the task involves browsing and exploring to gain insights without clear search keywords. Large language models (LLMs) and Retrieval Augmented Generation (RAG) offer a solution to this problem. RAG currently splits and embeds documents, then performs a semantic similarity search to retrieve relevant content. However, this approach can lead to an information cocoon problem. For example, if you want to understand the profit from the last financial year, your question will retrieve the profit report; however, it may not provide information on why the profit increased or decreased, as this information might be in sections before or after the profit section within your report document, and the content there may not have high semantic similarity with your question. In such cases, both the document's layout structure and semantics are equally important when using LLMs to answer exploratory tasks where you may not know exactly what you want to find out. To address this, we developed an open-source package called Docs2KG: https://docs2kg.ai4wa.com/, which will build a multimodal knowledge graph with dual aspects: One aspect represents the structural relationships within the documents, and the other aspect represents the semantic relationships. Using this foundation, we can implement GraphRAG, a method proposed by Microsoft, to achieve better results, especially for exploratory questions.
PhD Candidate, University of Western Australia
Pascal Sun is a PhD Candidate at the University of Western Australia's NLP-TLP Group. His research focuses on spatiotemporal knowledge graph question answering. Prior to this, Pascal spent five years as a full-stack developer, where he successfully developed several products for global markets. Seeking greater challenges and recognizing the imminent impact of AI, he commenced his PhD journey in late 2022. Pascal is passionate about harnessing AI to simplify everyday life. In the era of generative AI, Pascal is committed to exploring how generative AI and Large Language Models (LLMs) can be employed to unlock the potential within our data, bridging the gap between complex datasets and everyday individuals.