Location>code7788 >text

Revisiting GraphRAG: How to improve LLM summarization?

Popularity:352 ℃/2024-08-07 18:59:12

author: Wang Zhenya

Editorial:
Since Microsoft's release of GraphRAG, there have been numerous related interpretation articles, including many excellent contents. For example, some time ago we reprinted Xue Ming'sA Source Code Interpretation of the Microsoft GraphRAG Framework.Let's have a quick overview of GraphRAG's open source code. This time, we share the thinking on how GraphRAG can improve LLM summarization ability from ant technology student Wang Zhenya, the author of GraphRAG's source of inspiration, ability perspective, application scenarios have done a more excellent interpretation, but also on the application value of the graph technology to do an in-depth discussion, I believe that this article will give you a different harvest.

Note: This article is reprinted with the author's full permission.

GraphRAG is a knowledge graph based retrieval enhancement generation method. Microsoft open-sourced the GraphRAG project in early July, and within a month or so, it has gained 13k stars.

GraphRAG performs better than typical RAGs at providing high-level summarization and abstraction from multiple unstructured documents. For example, for a collection of articles on environmental issues, GraphRAG is better at answering questions such as "What are the top 5 themes of these articles?" questions. There are no directly relevant documents for the RAG to recall for this type of question, so it is difficult for a typical RAG to handle this type of question.

Prior to GraphRAG, there were also programs that dealt with such issues. For example, theRAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVALThe methodology mentioned in the paper is based on clustering documents with multiple levels of clustering based on different levels of abstraction, which is followed by summary summarization for subsequent RAG recall. RAPTOR is described in detail in the subsequent part of the article.

How does the GraphRAG approach differ from the RAPTOR approach? In addition to the summary summarization class of problems, in what problem processing does GraphRAG also have advantages, and what are the current shortcomings of GraphRAG, and what does the design of GraphRAG teach us about designing RAG systems?

This article analyzes and introduces GraphRAG around the above problems. The beginning part of the article gives a brief introduction to the problem GraphRAG solves and the original design intention, the second section mainly focuses on the principles and concepts of GraphRAG, and the last part is some opinions and ideas.

GraphRAG is not an "original" innovation, but rather a clever combination of previously existing techniques. The techniques involved include LLM, knowledge graphs, community detection and aggregation algorithms, and some Map-Reduce ideas.

The essence of GraphRAG's design can be summarized using its counterpart paper, TheFrom Local to Global: A Graph RAG Approach to Query-Focused Summarization》summarize in a single sentence:“Use the natural modularity of graphs to partition data for global summarization.”

The following section will expand on this statement and explain it.

1. Why GraphRAG?

1.1 What problem is GraphRAG solving?

quoteGraphRAG: Advanced Data Retrieval for Enhanced Insights Expression in the text:

1. Complex Information Traversal: It excels at connecting different pieces of information to provide new, synthesized insights.

2. Holistic Understanding: It performs better at understanding and summarizing large data collections, offering a more comprehensive grasp of the information.

The second point of "Holistic Understanding" is the "summarization" capability mentioned at the beginning of the paper, i.e., the ability to deal with the QFS (query focused summarization) problem, which requires high-level summarization and abstraction across multiple documents. GraphRAG's paper focuses on this point, and by analyzing the code, we can see that much of the design revolves around this capability.

The first point is the ability to enhance through the knowledge graph.GraphRAG, when constructing the knowledge graph (described in detail in the next section), correlates information distributed in different articles and information fragments through the knowledge graph. When querying, it is able to recall relevant anticipatory information, whereas usual RAGs are unable to fully recall the desired anticipation due to the absence of a pre-constructed knowledge graph. GraphRAG performs better for questions that require combining multiple corpora in order to be answered.

The main discussion in the paper is the enhancement of the summary summarization capability around which GraphRAG is evaluated. The "Complex Information Traversal" capability combines multiple pieces of information to provide new insights and is also used when performing QFS.

1.2 Can't we use LLMs with very large contexts for summary summarization?

The Claude 3 model context is 200K, can we just provide all articles at once to LLM for summary summarization? There are two problems with this.

One is the context size limitation, where a token limit of 200K may still be insufficient when dealing with a large corpus. The number of tokens of thousands of individual documents can easily exceed this limit. Moreover, the time and computational cost of processing hundreds of thousands of tokens at a time is too high. As described later, GraphRAG uses hierarchical summarization, i.e., intermediate data can be reused after a single computation.

Secondly, current LLMs exhibit the problem of "missing the point" or "ignoring some information" as the context becomes longer, as described in the GraphRAG paper:

The challenge remains, however, for query-focused abstractive summarization over an entire corpus. Such volumes of text can greatly exceed the limits of LLM context windows, and the expansion of such windows may not be enough given that information can be “lost in the middle” of longer contexts (Kuratov et al., 2024; Liu et al., 2023).

1.3 What are the differences compared to RAPTOR?

RAPTOR was originally designed to solve the QFS problem, the principle of implementation, refer to the RAPTOR paper:
RAPTOR 构建树的过程

RAPTOR constructs a multilayer tree before querying, and its construction is described below:

  1. Clustering is performed from the bottom Text chunk upwards, after clustering, LLM is used to summarize, and the summarized information is stored in the node.
  2. Clustering is "soft clustering": a Text chunk can be clustered into multiple groups.
  3. Clustering is done based on vector of embedding.
  4. The clustering algorithm first uses Uniform Manifold Approximation and Projection (UMAP) to dimensionalize the vector and then clusters it using Gaussian Mixture Models (GMMs) like approach.

By constructing a multi-layer tree in this way, users can use the summary information generated by different layers of the tree to put into the context of the LLM for reasoning and answering based on the level of different questions in the query, so as to provide better answers to questions that are relatively abstract and summarized.

What is the biggest difference between RAPTOR and GraphRAG? The difference lies in the way of clustering; GraphRAG builds knowledge graphs and then performs multi-layer clustering based on the correlation between nodes in the graphs ("Use the natural modularity of graphs to partition data for global summarization"), while RAPTOR's clustering is still based on the results of embedding. graphs to partition data for globalization"), while RAPTOR's clustering is still based on the results of embedding.

How much difference in effect is there between the two, no direct data comparison has been found. But from the principle of analysis, I personally tend to GraphRAG way. From the previous project practice of using the traditional RAG method, the current embedding recall effect is not ideal. If the embedding effect is not ideal, then the clustering based on embedding from the theoretical analysis will also have a lot of problems (these are just personal analysis, in practice, we need to compare and evaluate).

How does GraphRAG build knowledge graphs and how is it aggregated based on graph node relationships? This is expanded in the next section.

2. Introduction to GraphRAG

As mentioned earlier, GraphRAG is similar to RAPTOR in that it needs to pre-process documents, perform hierarchical clustering and summarization, and use the constructed data in the LLM context for inference in Query.GraphRAG can be divided into two parts: Indexing and Query.

2.1 Indexing

2.1.1 Basic process

Similar to a search engine based on a backward-ranking algorithm, the search engine needs to cut words and construct a backward-ranking index of all crawled documents for the subsequent keyword search phase.GraphRAG also needs to process all documents, but instead of constructing a backward-ranking index by cutting words, it uses LLM based on a special prompt processing, extracts entities and relationships, and constructs a knowledge graph.

The purpose of constructing a knowledge graph is not just to add correlations to LLM contexts when performing RAG inference, but to perform multiple layers of clustering to prepare intermediate data for better answering users' QFS questions.

GraphRAG has a complete built-in pipeline, and during the indexing phase, the main processes are as follows:

  1. Extract entities, relationships and claims (specific descriptions of relationships between entities and other entities) based on the raw text.
  2. Perform community detection on entities, which can be simply understood as clustering.
  3. Generate community summaries and community reports (more detailed than summaries) at multiple levels of granularity.
  4. Embedding entities into the graph vector space.
  5. Embedding a text fragment into a text vector space.

Demo after visualizing the structure of the constructed knowledge graph:

A few notes:

  1. The same color is the same community.
  2. Communities are hierarchical, with the higher level communities on the left having a much smaller number of colors than the sub-communities on the right. This is similar to the hierarchy in Raptor.
  3. Similar to RAPTOR, clustering is also performed from the bottom up.
  4. The algorithm uses Leiden.

2.1.2 Indexing Dataflow Analysis

Refer to the official documentation:Indexing DataflowThe main processes are as follows:

GraphRAG Indexing Dataflow

Not to expand in detail, many stages through the name can understand the general function, if you need to understand in detail you can directly refer to the official documents. Several from the name may not be able to see what is done in the processing, need to pay attention to the following points:

  1. Phase 3: Graph Augmentation phase, in which the structure of the graph is embedded once (Node2Vec algorithm), so that the associated information can be recalled in the subsequent Query phase.
  2. Phase 4: Community Summarization phase, also embedding the Community Summarization, which is also used for recalling the subsequent query.
  3. Phase 5: Document Processing phase, which saves the association relationship between Text Units and Documents into the atlas. In this way, in the subsequent LLM reasoning, there is such a relationship in the context, and the output result can clearly state which documents are based on which. On the one hand, we can judge whether the LLM-generated content is illusory based on the original documents, and on the other hand, when we need to know more details, we can link to them directly.
  4. Phase 6: Network Visualization phase, since the generated graph is generally not a planar graph (which can be achieved by drawing its vertices and edges on a plane without edge crossings), the structure and patterns of the data can be observed and understood more intuitively by mapping the non-planar graph onto a plane using UMAP (a dimensionality reduction technique) operations.

2.2 Query

Compared to RAPTOR, GraphRAG also uses different clustering hierarchies depending on the problem when querying. The difference is that GraphRAG defines two very different query methods.

One type of query is Local Search, which is used to deal with specific, relatively detail-oriented questions. The context used in this query is mainly the content in the knowledge graph and the original Text Units. After combining this information, the context is constructed once to invoke the LLM for inference.

The other one is Global Search, which is mainly used to deal with summary and relatively abstract problems. query uses Community Report, and due to the large amount of tokens in Community Report, which cannot be put into the context at one time, in order to avoid the loss of information, it adopts the Map-Reduce method.

Both are briefly described below.

2.2.1 Local Search

From the above dataflow, we can see that there are many kinds of contents in the context. First, the user query is used to obtain relevant entities from the knowledge graph by embedding, and then multiple information related to the entities are sorted and selected parts are put into the context, including:

  1. Associated raw text (Text Units)
  2. Entity-associated Community Reports
  3. Associated entities (Entities)
  4. Entity-related relationship information
  5. Attributes of the entity (Covariates, e.g., if the entity is an apple, color can be interpreted as an attribute)

Together with the session history, a query uses a lot more tokens than the usual RAG approach.

Code for constructing a contextbuild_contextThe context is the same as in dataflow. After constructing the context the inference uses theprompt Below:

"""Local search system prompts."""

LOCAL_SEARCH_SYSTEM_PROMPT = """
---Role---

You are a helpful assistant responding to questions about data in the tables provided.


---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: <dataset name> (record ids); <dataset name> (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}


---Data tables---

{context_data}


---Goal---

Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Points supported by data should list their data references as follows:

"This is an example sentence supported by multiple data references [Data: <dataset name> (record ids); <dataset name> (record ids)]."

Do not list more than 5 record ids in a single reference. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Sources (15, 16), Reports (1), Entities (5, 7); Relationships (23); Claims (2, 7, 34, 46, 64, +more)]."

where 15, 16, 1, 5, 7, 23, 2, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
"""

where {context_data} is a placeholder variable for the context constructed by the above dataflow.

A Local Search calls LLM once and that's it, but a Global Search query may call LLM a dozen times, here's how Global Search handles it.

2.2.2 Global Search

As mentioned earlier, the context used by Global Search is very different from Local Search, which uses a collection of Community Reports at a specific level. Since a single context may not be able to hold these Community Reports, a MapReduce operation is required:

  1. Split Community Reports into multiple parts, then each part uses LLM concurrency to reason based on the user query, and each part summarizes a few key summary points. When reasoning and summarizing, LLM generates weights to facilitate the final reduce operation.
  2. Merge the results of each part of the reasoning and perform a reduce operation on all the summarized points - again using LLM for summary summarization.

map phaseprompt

"""System prompts for global search."""

MAP_SYSTEM_PROMPT = """
---Role---

You are a helpful assistant responding to questions about data in the tables provided.


---Goal---

Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables.

You should use the data provided in the data tables below as the primary context for generating the response.
If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up.

Each key point in the response should have the following element:
- Description: A comprehensive description of the point.
- Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0.

The response should be JSON formatted as follows:
{{
    "points": [
        {{"description": "Description of point 1 [Data: Reports (report ids)]", "score": score_value}},
        {{"description": "Description of point 2 [Data: Reports (report ids)]", "score": score_value}}
    ]
}}

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

Points supported by data should list the relevant reports as references as follows:
"This is an example sentence supported by data references [Data: Reports (report ids)]"

**Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:
"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 64, 46, 34, +more)]. He is also CEO of company X [Data: Reports (1, 3)]"

where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data report in the provided tables.

Do not include information where the supporting evidence for it is not provided.


---Data tables---

{context_data}

---Goal---

Generate a response consisting of a list of key points that responds to the user's question, summarizing all relevant information in the input data tables.

You should use the data provided in the data tables below as the primary context for generating the response.
If you don't know the answer or if the input data tables do not contain sufficient information to provide an answer, just say so. Do not make anything up.

Each key point in the response should have the following element:
- Description: A comprehensive description of the point.
- Importance Score: An integer score between 0-100 that indicates how important the point is in answering the user's question. An 'I don't know' type of response should have a score of 0.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

Points supported by data should list the relevant reports as references as follows:
"This is an example sentence supported by data references [Data: Reports (report ids)]"

**Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:
"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 64, 46, 34, +more)]. He is also CEO of company X [Data: Reports (1, 3)]"

where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data report in the provided tables.

Do not include information where the supporting evidence for it is not provided.

The response should be JSON formatted as follows:
{{
    "points": [
        {{"description": "Description of point 1 [Data: Reports (report ids)]", "score": score_value}},
        {{"description": "Description of point 2 [Data: Reports (report ids)]", "score": score_value}}
    ]
}}
"""

From the above prompt you can see that the output is in JSON format:

{
    {
        "points": [
            {
                {
                    "description": "Description of point 1 [Data: Reports (report ids)]",
                    "score": score_value
                }
            },
            {
                {
                    "description": "Description of point 2 [Data: Reports (report ids)]",
                    "score": score_value
                }
            }
        ]
    }
}

The reduce phase of theprompt

"""Global Search system prompts."""

REDUCE_SYSTEM_PROMPT = """
---Role---

You are a helpful assistant responding to questions about a dataset by synthesizing perspectives from multiple analysts.


---Goal---

Generate a response of the target length and format that responds to the user's question, summarize all the reports from multiple analysts who focused on different parts of the dataset.

Note that the analysts' reports provided below are ranked in the **descending order of importance**.

If you don't know the answer or if the provided reports do not contain sufficient information to provide an answer, just say so. Do not make anything up.

The final response should remove all irrelevant information from the analysts' reports and merge the cleaned information into a comprehensive answer that provides explanations of all the key points and implications appropriate for the response length and format.

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

The response should also preserve all the data references previously included in the analysts' reports, but do not mention the roles of multiple analysts in the analysis process.

**Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 34, 46, 64, +more)]. He is also CEO of company X [Data: Reports (1, 3)]"

where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}


---Analyst Reports---

{report_data}


---Goal---

Generate a response of the target length and format that responds to the user's question, summarize all the reports from multiple analysts who focused on different parts of the dataset.

Note that the analysts' reports provided below are ranked in the **descending order of importance**.

If you don't know the answer or if the provided reports do not contain sufficient information to provide an answer, just say so. Do not make anything up.

The final response should remove all irrelevant information from the analysts' reports and merge the cleaned information into a comprehensive answer that provides explanations of all the key points and implications appropriate for the response length and format.

The response shall preserve the original meaning and use of modal verbs such as "shall", "may" or "will".

The response should also preserve all the data references previously included in the analysts' reports, but do not mention the roles of multiple analysts in the analysis process.

**Do not list more than 5 record ids in a single reference**. Instead, list the top 5 most relevant record ids and add "+more" to indicate that there are more.

For example:

"Person X is the owner of Company Y and subject to many allegations of wrongdoing [Data: Reports (2, 7, 34, 46, 64, +more)]. He is also CEO of company X [Data: Reports (1, 3)]"

where 1, 2, 3, 7, 34, 46, and 64 represent the id (not the index) of the relevant data record.

Do not include information where the supporting evidence for it is not provided.


---Target response length and format---

{response_type}

Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
"""

NO_DATA_ANSWER = (
    "I am sorry but I am unable to answer this question given the provided data."
)

GENERAL_KNOWLEDGE_INSTRUCTION = """
The response may also include relevant real-world knowledge outside the dataset, but it must be explicitly annotated with a verification tag [LLM: verify]. For example:
"This is an example sentence supported by real-world knowledge [LLM: verify]."
"""

prompt center “Note that the analysts' reports provided below are ranked in the descending order of importance.” Indicates that when given the LLM ahead,have taken map Stage information is sorted in descending order based on ratings。

The above is the main process of Global Search. In addition to performing Query for different levels of questions, GraphRAG can also perform question recommendation due to the use of knowledge graph, and this capability is briefly analyzed in the next subsection.

2.3 Question Generation

Thanks to the use of knowledge graphs, complex relational connections are constructed, which can also be used for question recommendation. For example, if a user asks "Where to go when traveling in Hangzhou?" A question like "What are the best foods/historical figures in Hangzhou?" can be recommended based on the relationships in the graph. based on the relationships in the graph.

Prior to GraphRAG, knowledge graphs have been utilized in quite a few recommender systems for recommendations. The slight difference is that GraphRAG uses LLM generation and some unique contexts. In GraphRAG, question recommendation uses the same context as Local Search, using theprompt Below:



"""Question Generation system prompts."""

QUESTION_SYSTEM_PROMPT = """
---Role---

You are a helpful assistant generating a bulleted list of {question_count} questions about data in the tables provided.


---Data tables---

{context_data}


---Goal---

Given a series of example questions provided by the user, generate a bulleted list of {question_count} candidates for the next question. Use - marks as bullet points.

These candidate questions should represent the most important or urgent information content or themes in the data tables.

The candidate questions should be answerable using the data tables provided, but should not mention any specific data fields or data tables in the question text.

If the user's questions reference several named entities, then each candidate question should reference all named entities.

---Example questions---
"""

2.4 Summary

A few questions to briefly summarize and conclude this chapter.

2.4.1 What is the difference between GraphRAG, RAG, and RAPTOR?

GraphRAG RAG RAPTOR
Applicable Scenarios 1. Specific issues
2. Summary questions
1. Specific issues
2. Less effective summary questions
Summary-type questions
preprocessing 1. Using LLM to build knowledge maps
2. Embedding of nodes in the knowledge graph
3. Use community clustering algorithm and LLM to construct multi-layer community structure based on knowledge graph, and embedding
Text slicing, embedding processing 1. Building multi-tier clusters
2. Embedding the text in the cluster
Pretreatment costs 1. very high, all text slices need to be put through LLM to build a knowledge graph
2. Building a community also requires the use of LLMs.

low, only low-cost embedding is needed
High, need to summarize all text LLMs in multiple layers
Query 1. There are two kinds of Query, Local Search and Global Search.
2. Global Search requires multiple calls to LLM via concurrent MRs
1. Unitary approach Query once
2. Query 1 LLM
1. Single way, but can be done through different levels of clusters
2. One Query, One LLM
Query Cost 1. Local Search High (Contexts requiring multiple information)
2. Global Search very high, lots of context, multiple LLMs
Low, embedding the text into the context is sufficient. In general, the contents of the cluster are placed in context

2.4.2 How does a knowledge graph constructed by GraphRAG differ from a general knowledge graph?

Knowledge graphs built by GraphRAG General Knowledge Graph
construction method Building with LLM Generally through machine learning algorithms
Nodes, edges Detailed description, description of Embedding in the model of the entity, relationship, can refer to the code:entityrelationship It's usually a simple description with no embedding information.
inference Relationships, entities, and their descriptions can be put into the context of the LLM and "indirect" reasoning can be performed using the LLM. Reasoning with standardized input requirements, e.g. "China" + "Flag" as a specific inference condition
How to use 1. Use community clustering algorithms based on knowledge graphs to build multi-layered community
2. Recommendation generation
Broader, such as search engine optimization, recommendation, anti-money laundering and fraud identification in financial risk control

3. Some thoughts and ideas

3.1 Correctness is better than response time

This idea was mentioned in the introduction of Agency Workflow (TODO), and I thought about it again during the learning process of GraphRAG, whose Indexing construction process and Query process can be understood as a kind of workflow.

When looking at the Global Query section of the official documentation, the first thing I thought was "time-consuming" when I saw that Map-Reduce was used (MR is not necessarily related to time-consuming, it's just that QDPS is relatively slow in doing offline analysis in the day-to-day life, so I formed my own cognitive fallacy). Time-consuming is not necessarily a problem. Just like QDPS for offline data analysis, you need to balance accuracy, cost and time-consuming.

"Correctness is better than response time" should be the design philosophy of some LLM products, but few LLM products are designed based on this philosophy. Many products are designed with a focus on response time to the detriment of another important dimension of the user experience - accuracy.

However, putting the concept of "correctness over response time" into practice can present the following challenges:

  1. Acceptance by product design teams: Response times may be extended from the original second level to the minute or even hour level. For product designers, designing a product with a response time of minutes when other products are aiming for a second response is certainly a huge challenge.
  2. High-cost problem: If a task requires LLM to perform multiple sessions and multiple iterations of reasoning, this will consume a lot of computational resources, and the cost of each task may be as high as several tens of RMB, thus bringing considerable cost pressure.
  3. User experience assurance: As response times increase, the question of how to maintain as good a user experience as possible becomes a major issue. Is it better to provide users with multiple choices (i.e., choosing options with longer response times but higher accuracy, or shorter response times but average quality), or to change the way the product interacts with offline processing?

"Correctness over response time is not just a technical compromise, but will become a design philosophy for more and more products as LLM becomes more popular. Correctness over response time will be accepted as a product design as users get better quality and more accurate results with this "time-consuming" product.

3.2 Graph can be used to implement Query rewriting.

In many dialogs, the user query will be rewritten in order to achieve better results. One way of rewriting is similar to "expanding", for example, in the case of "Introduce Hangzhou", the question will be split into several smaller questions, such as "Geographic information about Hangzhou", "Economic situation in Hangzhou", "Historical and cultural information about Hangzhou", and then embed these three questions separately, "Hangzhou's economic situation", "Hangzhou's historical and cultural information", and then use these three questions to do embedding, and put the embedded information into the LLM for inference, instead of directly using Instead of directly using the embedding vector matching of "introduce Hangzhou" to recall the text, which may not be able to recall, or the recalled information is not comprehensive.

Splitting a large problem into several smaller problems, recalling the corpus and summarizing it can also improve the comprehensiveness of using RAG to improve the summary summarization class of problems. And the entity relationships in the knowledge graph can also be utilized when disassembling the problem.

3.3 Graphs are the best data structure for QA libraries

Before LLM, bots maintained a lot of mappings from "intent" to "standard answer". In order to prevent LLM illusion and improve performance, many dialog bots using LLM nowadays also maintain a lot of QA questions (question-standard-answer pairs) when the intent of the user input is very clear, so that they can return quickly when they can match the intent or semantics.

Such manually maintained QA libraries, with relationships between QAs, use graphs as the best data structure. General QA is also centered around some topic (entity) with multiple levels and aspects. In the case of using LLM capability, whether it is for recommendation of related questions, expansion of user query, or summarizing answers, using graph structure can easily get more related contexts, and thus get better results when using LLM reasoning.

3.4 Scenarios for GraphRAG

3.4.1 Data analysis: trend analysis/opinion perception

Trend analysis of large volumes of text is ideally suited to GraphRAG, which builds a knowledge graph from the bottom up, making it easy to spot hot trends, new topics added, and gives a very systematic description and analysis of trends through community clustering.

3.4.2 Specialized field-to-customer robots

Knowledge in specialized fields is often systematic and multilayered in structure. To provide in-depth answers to specialized questions requires not only a high-quality corpus, but also the ability to represent the relationships between the corpus so as to answer questions at different levels. Knowledge in specialized domains is relatively limited, and the cost of constructing a similar knowledge graph is manageable.

4. Summary

This paper describes the fundamentals of GraphRAG and the problems it solves.

This scenario of analyzing and summarizing a large amount of dispersed material is one of the rare and very practical solutions for GraphRAG until there is a major breakthrough in the contextual capabilities of LLM.

If the job involves extracting ideas and sensing trends from large amounts of text, GraphRAG is well worth investing time in researching.

Above, thanks for reading.

5. Reference

  1. Github address:/microsoft/graphrag
  2. Paper Address:/pdf/2404.16130
  3. Microsoft Blog:/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
  4. Official Documentation:/graphrag/
  5. GraphRAG Presents:/graphrag-advanced-data-retrieval-for-enhanced-insights-bcef777404d2
  6. RAPTOR Thesis:/pdf/2401.18059
  7. LangChain implements RAPTOR:/langchain-ai/langchain/blob/master/cookbook/
  8. LangChain implements RAPTOR video:/watch?v=jbGchdTL7d0