Stronger RAG: The combination of vector databases and knowledge graphs

Limitations of traditional RAG

The classic RAG architecture centers on a VectorDB to retrieve semantic similarity contexts, allowing Large Language Models (LLMs) to acquire up-to-date knowledge without re-training, with the workflow shown in the following figure:

This architecture is currently widely used in various AI business scenarios, such as Q&A robots, intelligent customer service, private domain knowledge base retrieval, and so on. Although RAG mitigates the LLM illusion problem to a certain extent through knowledge enhancement, it still faces the problem of low accuracy, which is limited by the inherent limitations of the information retrieval process (e.g., relevance of information retrieval, efficiency of search algorithms, and the Embedding model) as well as a large number of dependencies on the LLM capability, which makes the process of generating results have more uncertainty.

The figure below is from a well-known paper in the field of RAG:《Seven Failure Points When Engineering a Retrieval Augmented Generation System》 , which summarizes some of the problems often encountered when using the RAG architecture (the red boxed portion, which the original author calls the 7 points of failure).

They include:

Missing Content, where the answer the user wants is not in the knowledge base and the LLM gives a meaningless answer.
Missed Top Ranked (MTR), the top-k retrieved from the vector database is not necessarily the most accurate answer due to the influence of text slicing, size, embedding model, etc.
Not in Context (out of context), similar to the previous one, the final context taken out after various processing is of lower quality.
Wrong Format, where the retrieved results are not in a specific format and LLM does not recognize them well enough to analyze them.
Not Extracted, Prompt contains a lot of irrelevant information that affects the LLM judgment, i.e., the context contains a lot of noise.
Incomplete (Incomplete), the generated answer is not complete enough.
Incorrect Speciticity, where the answer is returned in the response, but is not specific enough or too specific to meet the user's needs.

All of the above questions currently have some optimization methods to improve answer accuracy, but still fall short of our expectations.

Therefore, in addition to matching text blocks based on semantics, the industry is also exploring new forms of data retrieval, such as focusing on the correlation between data on top of semantics. This kind of correlation is different from the logical strong dependency in the relational model, but with a certain semantic correlation. For example, people and cell phones are two independent entities, the similarity in the vector database must be very poor, but combined with the real-life scenarios, the relationship between people and cell phones is very close, if I search for a person's information, I am very likely to care about his periphery, for example, what model of cell phone, like to take pictures or play games and so on.

Based on such a relational model, the context retrieved from the knowledge base may be more effective in certain scenarios, e.g., "Who are the people in the company who like to take pictures with their cell phones?".

Presenting Data Relationships with Knowledge Graphs

In order to effectively describe the abstract data relationships in the knowledge base, the concept of knowledge graph is introduced, which no longer uses two-dimensional tables to organize the data, but instead describes the relationships using a graph structure, which has no fixed paradigm constraints on the relationships, similar to the following character relationship graph:

![](/blog/614524/202410/)

The above diagram has the three most important elements: objects (people), attributes (identities), and relationships, and for storing such data there are specialized database products to support them, i.e., diagram databases.

GraphDB

Graph databases are a kind of NoSQL, which has a more flexible schema definition, and can express any association relationship in the real world in a simple and efficient way. The main concepts in graph databases are:

Entity, also called a vertex or node, an object in the graph structure
Edge, a path connecting two entities, i.e. the relationship between entities
Properties, which are used to specifically characterize an entity or edge

The above concepts can be corresponded to the previous character relationship diagram.

Specifically at the level of schema definition and data manipulation, take the popular graph database NebulaGraph as an example:

# Creating a graph space
CREATE SPACE IF NOT EXISTS test(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
USE test;

# Creating nodes and edges
CREATE TAG IF NOT EXISTS entity(name string);
CREATE EDGE IF NOT EXISTS relationship(relationship string);
CREATE TAG INDEX IF NOT EXISTS entity_index ON entity(name(256));

# write data
INSERT VERTEX entity (name) VALUES "1":("Digital China");
INSERT VERTEX entity (name) VALUES "2":("cloud base");
INSERT EDGE relationship (relationship) VALUES "1"->"2":("Established");
...

The query syntax of the graph database follows the Cypher standard:

MATCH p=(n)-[*1..2]-() RETURN p LIMIT 100;

The above statement finds up to 100 paths from the NebulaGraph that start at a node and are connected by 1 or 2 edges, and returns those paths.

If you show it in a visual diagram structure, it looks like this:

GraphRAG

If the idea of VectorRAG is borrowed and the retrieval enhancement is realized through graph database, the GraphRAG architecture is evolved, and the overall process is not different from VectorRAG, except that the storage and retrieval of new knowledge are realized by knowledge graph, which solves the problem of VectorRAG's weak understanding of abstract relations.

GraphRAG can be easily implemented with the help of some AI scaffolding tools, here is a simple GraphRAG application implemented with LlamaIndex and the graph database NebulaGraph:

import os
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index. import OpenAI
from import Network
from llama_index.core import (
    Settings,
    SimpleDirectoryReader,
    KnowledgeGraphIndex,
)

# Parameter preparation
['OPENAI_API_KEY'] = "sk-proj-xxx"
["NEBULA_USER"] = "root"
["NEBULA_PASSWORD"] = "nebula"
["NEBULA_ADDRESS"] = "10.:9669"

space_name = "heao"
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")

 = llm
Settings.chunk_size = 512

# groutNebulaDatabase Examples
graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
   )
storage_context = StorageContext.from_defaults(graph_store=graph_store)

hosts=graph_store.query("SHOW HOSTS")
print(hosts)

# Extracting the knowledge graph to write to the database
documents = SimpleDirectoryReader(
    "data"
).load_data()

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=2,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    max_knowledge_sequence=15,
)

# Information retrieval to generate answers
query_engine = kg_index.as_query_engine()
response = query_engine.query("Where is the Digital China Cloud Base？")
print(response)

VectorRAG is good at dealing with problems that are somewhat factual and weak at understanding complex relationships, which GraphRAG compensates for by giving theoretically better results when the two are combined.

HybridRAG - Next Generation RAG Architecture

We can put one copy of the data in VectorDB and one copy in GraphDB, and after getting the results related to the problem through vectorized search and graph search respectively, we connect these results to form a unified context, and finally pass the combined context to the big language model to generate a response, which forms the HybridRAG architecture.

A paper published some time ago by NVIDIA team Benika Hall et al.《HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction》The idea is presented and validated by comparison using scenarios such as data integration, risk management, and predictive analytics in the financial services industry. VectorRAG performs well when context is required from relevant text documents to generate meaningful and coherent responses, while GraphRAG is able to generate more accurate and context-aware responses from structured information extracted from financial documents, but GraphRAG typically performs poorly in abstract quizzing tasks or when entities are not explicitly mentioned in the question.

The paper concludes with test results for VectorRAG, GraphRAG and HybridRAG:

In the table F stands for fidelity, which is used to measure how far the generated answer can be inferred from the provided context.AR stands for answer relevance, which is used to evaluate how far the generated answer solves the original question.CP stands for contextual precision and CR stands for contextual recall.

Implementing a simple HybridRAG

The following example implements a simple HybridRAG process using Llama_index as the application framework, TiDB Vector as the vector database, and NebulaGraph as the graph database:

import os
from llama_index.core import KnowledgeGraphIndex, VectorStoreIndex,StorageContext,Settings
from llama_index. import OpenAIEmbedding
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index. import OpenAI
from llama_index.vector_stores.tidbvector import TiDBVectorStore

TIDB_CONN_STR="mysql+pymysql://:xx@:4000/test?ssl_ca=&ssl_verify_cert=true&ssl_verify_identity=true"

["OPENAI_API_KEY"] = "sk-proj-xxx"
["NEBULA_USER"] = "root"
["NEBULA_PASSWORD"] = "nebula"
["NEBULA_ADDRESS"] = "10.:9669"

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")

 = llm
Settings.chunk_size = 512

class HybridRAG:
    def __init__(self):
        # Initialize Vector Database
        tidbvec = TiDBVectorStore(
            connection_string=TIDB_CONN_STR,
            table_name="semantic_embeddings",
            distance_strategy="cosine",
            vector_dimension=1536, # The dimension is decided by the model
            drop_existing_table=False,
        )
        tidb_vec_index = VectorStoreIndex.from_vector_store(tidbvec)
        self.vector_db = tidb_vec_index
        
        # Initializing the Knowledge Graph
        graph_store = NebulaGraphStore(
            space_name="heao",
            edge_types=["relationship"],
            rel_prop_names=["relationship"],
            tags=["entity"],
        )
        storage_context = StorageContext.from_defaults(graph_store=graph_store)
        kg_index = KnowledgeGraphIndex.from_documents(
            [],
            storage_context=storage_context,
            max_triplets_per_chunk=2,
            max_knowledge_sequence=15,
        )
         = kg_index
        
        # Initializing the Language Model
         = llm
        
        # Initialize the embedding model
         = OpenAIEmbedding()
        
    def vector_search(self, query):
        # Search for relevant text in vector databases
        results = self.vector_db.as_retriever().retrieve(query)
        print(results)
        return [ for result in results]
    
    def graph_search(self, query):
        # Search for relevant information in the Knowledge Graph
        results = .as_retriever().retrieve(query)
        print(results)
        return [ for result in results]
    
    def generate_answer(self, query):
        vector_context = self.vector_search(query)
        graph_context = self.graph_search(query)
        combined_context = "\n".join(vector_context) + "\n" + graph_context
        
        prompt = f"Question: {query}\nContext: {combined_context}\nAnswer:"
        return (prompt)

# usage example
hybrid_rag = HybridRAG()
question = "TiDBFrom which version the resource management feature is supported?"
answer = hybrid_rag.generate_answer(question)
print(answer)

The above implementation simply assembles the retrieval results from vector and graph databases, and the relevant reranker model can be introduced in practical applications to further do fine-tuning to improve the contextual accuracy. No matter what kind of RAG architecture is used in the end, the goal we want to achieve is always to ensure that the context passed to the big language model is more complete and effective, so that the big language model can give more accurate answers.

summarize

RAG architecture has been an effective solution to enhance the capability of large language models, given that the correctness rate cannot be consistently maintained in different business scenarios, a series of changed RAG architectures have also been derived, such as Advanced RAG, GraphRAG, HybridRAG, RAG2.0, etc., which have either complementary relationships with each other, or optimization and enhancement or The alternative routes show that the optimization of LLM applications is still undergoing rapid changes.