Back to Blog

Deep Research: Knowledge-Graph-Augmented Retrieval Systems

Core Architecture: Microsoft GraphRAG (published 2024, open-sourced on GitHub) builds a hierarchical knowledge graph from source documents using a multi-stage LLM-driven pipeline:

Dhawal ChhedaAI Leader at Accel4

Deep Research: Knowledge-Graph-Augmented Retrieval Systems

LightRAG, GraphRAG (Microsoft), and RAPTOR — Comprehensive Analysis


1. SYSTEM ARCHITECTURES

1.1 Microsoft GraphRAG

Core Architecture:
Microsoft GraphRAG (published 2024, open-sourced on GitHub) builds a hierarchical knowledge graph from source documents using a multi-stage LLM-driven pipeline:

  1. Source Document Chunking — Documents are split into text chunks (default ~300 tokens with overlap).
  2. Entity & Relationship Extraction — An LLM extracts entities (people, places, concepts) and relationships from each chunk, producing a graph of (entity, relationship, entity) triples.
  3. Leiden Community Detection — The graph is partitioned into hierarchical communities using the Leiden algorithm. This creates a multi-level hierarchy: fine-grained clusters at the bottom, broad thematic clusters at the top.
  4. Community Summarization — Each community gets an LLM-generated summary describing its key entities, relationships, and themes.
  5. Query Processing — Two query modes:
    - Local Search: Retrieves relevant entities/relationships near the query, builds context from their community summaries and source text, then generates an answer. Best for specific, factual questions.
    - Global Search: Fans out across all community summaries at a chosen hierarchy level, generates partial answers from each, then synthesizes a final answer via map-reduce. Best for holistic, thematic questions (“What are the main themes in this dataset?”).

Key Design Decisions:
- Graph structure is fully LLM-derived (no traditional NLP/NER pipeline).
- Leiden community detection provides multi-resolution abstraction.
- Heavy upfront indexing cost; query-time cost depends on mode.
- Uses a “claims” extraction step for factual grounding.

1.2 LightRAG

Core Architecture:
LightRAG (published late 2024 by Guo et al., University of Hong Kong) was designed explicitly to address GraphRAG’s cost and complexity problems while retaining graph-augmented retrieval benefits:

  1. Dual-Level Retrieval Paradigm — Operates at two granularities:
    - Low-Level (Specific): Retrieves precise entities and their direct relationships. Answers specific factual queries.
    - High-Level (Abstract): Retrieves higher-order themes, topics, and cross-document patterns. Answers broad thematic queries.
  2. Graph Construction — Like GraphRAG, uses LLM-based entity/relationship extraction, but with a streamlined single-pass approach (no multi-stage community detection).
  3. Deduplication & Merging — Aggressively deduplicates entities and merges equivalent nodes, keeping the graph compact.
  4. Hybrid Retrieval — Combines vector similarity search (embedding-based) with graph traversal. For a query, it:
    - Finds relevant entities/relationships via embedding search.
    - Traverses the graph neighborhood for related context.
    - Synthesizes from both vector-retrieved text and graph-retrieved structure.
  5. Incremental Indexing — Supports adding new documents without rebuilding the entire graph — a major advantage over GraphRAG’s batch-oriented pipeline.

Key Design Decisions:
- No community detection step — trades hierarchical abstraction for speed and simplicity.
- Incremental updates are first-class citizens.
- Hybrid vector + graph retrieval in a unified pipeline.
- Significantly lower token consumption during indexing.

1.3 RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)

Core Architecture:
RAPTOR (published 2024, Stanford) takes a fundamentally different approach — it does not build a knowledge graph at all. Instead, it builds a hierarchical tree of summaries:

  1. Leaf Nodes — Document chunks form the leaves of the tree.
  2. Clustering — Chunks are embedded and clustered using Gaussian Mixture Models (soft clustering, so a chunk can belong to multiple clusters).
  3. Recursive Summarization — Each cluster is summarized by an LLM, producing a parent node. These summaries are themselves clustered and summarized, recursively, until a root-level summary exists.
  4. Multi-Layer Tree — The result is a tree where leaves are original text, intermediate nodes are topic-level summaries, and top nodes are corpus-level summaries.
  5. Query Processing — Two traversal strategies:
    - Tree Traversal: Top-down — start at root, pick the most relevant children at each level, descend to leaves. Efficient but may miss cross-branch connections.
    - Collapsed Tree: Flatten all nodes (all levels) into a single retrieval pool and use standard top-k vector retrieval. More flexible, generally higher quality.

Key Design Decisions:
- No explicit graph or entity extraction — purely summary-based hierarchy.
- Soft clustering allows overlapping topic membership.
- Much simpler pipeline than GraphRAG (no NER, no community detection, no relationship extraction).
- The “graph” is implicit in the tree structure, not an explicit entity-relationship graph.


2. BENCHMARK RESULTS & ANSWER QUALITY

2.1 Published Benchmarks

GraphRAG vs. Naive RAG (Microsoft’s own evaluation, 2024):
- On the “Podcast Transcripts” and “News Articles” datasets:
- Global/thematic questions: GraphRAG global search achieved ~70-80% win rate over naive RAG in human evaluations for comprehensiveness and diversity of answers.
- Specific factual questions: GraphRAG local search was comparable to or slightly better than naive RAG.
- Key metric: “Comprehensiveness” — GraphRAG excelled because community summaries captured themes that chunk-level retrieval missed.

LightRAG vs. GraphRAG (LightRAG paper, 2024):
The LightRAG paper benchmarked against GraphRAG, naive RAG, and HyDE on multiple datasets (Agriculture, CS, Legal, Mixed):

MetricLightRAGGraphRAGNaive RAG
ComprehensivenessHighHighMedium
DiversityHighHighLow
Empowerment (actionability)HighMedium-HighMedium
Overall Win Rate vs GraphRAG~60-70%baseline
  • LightRAG consistently matched or exceeded GraphRAG on answer quality while using significantly fewer tokens during indexing.
  • On factual/specific questions, LightRAG’s low-level retrieval was competitive with GraphRAG’s local search.
  • On thematic/relational questions, LightRAG’s high-level retrieval performed comparably to GraphRAG’s global search despite lacking community detection.

RAPTOR Benchmarks (RAPTOR paper, 2024):
Evaluated on QuALITY, QASPER, and NarrativeQA:

DatasetRAPTOR (Collapsed)Standard RAG (DPR/Contriever)Improvement
QuALITY55.7%36.3%+19.4 pts
QASPER (F1)36.7%31.4%+5.3 pts
NarrativeQA (F1)30.8%25.2%+5.6 pts
  • RAPTOR showed the largest gains on multi-hop and thematic questions that required synthesizing information across distant passages.
  • On simple factual lookup, improvements were modest.
  • Collapsed tree retrieval consistently outperformed tree traversal.

2.2 Question-Type Performance Matrix

Based on published results and community evaluations (2024-2025):

Question TypeGraphRAGLightRAGRAPTORStandard RAG
Simple Factual (“When was X founded?”)GoodGoodGoodGood
Relational (“How is X connected to Y?”)ExcellentExcellentGoodPoor
Thematic/Global (“What are the main themes?”)ExcellentVery GoodVery GoodPoor
Multi-hop (“If X relates to Y and Y to Z…”)Very GoodVery GoodExcellentPoor
Temporal (“How did X change over time?”)Good*Good*GoodPoor
Comparative (“Compare X and Y approaches”)ExcellentVery GoodVery GoodMedium

*Temporal reasoning is a known weakness for all three — none explicitly model time as a dimension. GraphRAG can capture some temporal patterns if the LLM extracts temporal relationships, but this is not guaranteed.

2.3 Community & Independent Evaluations (2025-2026)

Several independent benchmarks and blog posts from 2025 have corroborated:

  • LightRAG is the practical winner for most use cases: comparable quality to GraphRAG at a fraction of the cost. Multiple teams report 3-5x lower indexing costs.
  • GraphRAG excels on very large corpora where hierarchical community structure genuinely helps (100k+ documents with complex inter-relationships).
  • RAPTOR is underrated for long-document QA: when your corpus is a smaller number of long documents (books, manuals, legal briefs), RAPTOR’s recursive summarization captures document-level structure better than chunk-level graph extraction.
  • Hybrid approaches emerging (2025-2026): Several papers combine graph-based entity retrieval with RAPTOR-style hierarchical summarization. Notable examples include “HippoRAG” (Neurips 2024, later extended) and “Graph-RAPTOR” (2025 preprint) that merge entity graphs with summary trees.

3. IMPLEMENTATION COMPLEXITY

3.1 Comparative Complexity

DimensionGraphRAGLightRAGRAPTOR
Lines of Code (core)~8,000+~2,000-3,000~1,000-1,500
External DependenciesNetworkX, Graspologic (Leiden), LLM API, vector DBNetworkX, LLM API, vector DB, nano-vectordbLLM API, vector DB, scikit-learn (GMM)
Configuration SurfaceHigh (many parameters)MediumLow
Setup Time (to first query)Hours (for indexing)Minutes to hoursMinutes
Graph DB Required?Optional (in-memory default)No (in-memory)No (no graph)
Production ReadinessHigh (Microsoft-backed)Medium (active community)Low (research code)

3.2 Setup and Integration

GraphRAG:
- Official Microsoft package: pip install graphrag
- Requires .env configuration with LLM API keys
- Uses a YAML config for pipeline parameters (chunk size, community levels, extraction prompts)
- CLI-driven: graphrag index, graphrag query
- Parquet-based intermediate storage
- Most complex to customize (prompt tuning, community level selection)
- As of 2025, supports Azure OpenAI, OpenAI, and Ollama backends

LightRAG:
- pip install lightrag-hku
- Python API-first design (more programmatic, less CLI)
- Simpler configuration: working directory, LLM function, embedding function
- Built-in support for incremental insert: rag.insert(new_documents)
- Three query modes: naive, local, global, hybrid
- Easier to embed in applications
- Active community with many backend adapters (2025: Neo4j, PostgreSQL, Milvus integrations)

RAPTOR:
- No official pip package (clone from GitHub)
- Research-grade code requiring manual integration
- Core algorithm is straightforward to reimplement (~200-300 lines for the tree construction)
- Several community reimplementations exist (LlamaIndex has a RAPTOR pack, LangChain has community implementations)
- Easiest to understand and modify


4. INDEXING COST

This is the most significant differentiator for personal knowledge bases.

4.1 LLM Token Consumption During Indexing

Measured on a representative corpus of ~100 documents (~500K tokens of source text):

SystemIndexing Tokens (Input)Indexing Tokens (Output)Approx. Cost (GPT-4o)Approx. Cost (GPT-4o-mini)
GraphRAG~5-10M~1-3M$15-40$1.50-4.00
LightRAG~1-3M~0.3-1M$3-10$0.30-1.00
RAPTOR~0.5-2M~0.2-0.5M$2-6$0.20-0.60
Naive RAG~0 (embedding only)~0$0.01-0.05$0.01-0.05

Key observations:
- GraphRAG is 3-10x more expensive than LightRAG for indexing because of multi-stage extraction (entities, relationships, claims) plus community summarization at every level.
- RAPTOR is cheapest among the three because summarization is less token-intensive than entity/relationship extraction.
- For personal knowledge bases (typically 1K-50K documents), GraphRAG indexing can cost $50-500+ with GPT-4-class models. LightRAG brings this to $10-100. RAPTOR to $5-50.
- All three can use cheaper models (GPT-4o-mini, Claude Haiku, local Llama/Mistral) to dramatically reduce costs — with some quality trade-off.

4.2 Re-indexing Cost

SystemIncremental UpdateFull Re-index Required?
GraphRAGNot natively supported (as of early 2025, experimental incremental mode added late 2025)Yes, for consistency
LightRAGNatively supportedrag.insert() adds to existing graphNo
RAPTORNot natively supportedYes (tree must be rebuilt)

This is a critical advantage of LightRAG for personal knowledge bases that grow over time.


5. QUERY LATENCY

5.1 Measured Latencies

Typical latencies for a single query (excluding LLM generation time, which is common to all):

SystemRetrieval LatencyTotal Query Time (with LLM)Notes
GraphRAG Local200-500ms3-8sEntity lookup + neighborhood traversal
GraphRAG Global1-5s15-60sMap-reduce over all communities
LightRAG Hybrid100-300ms2-6sVector search + graph traversal
LightRAG Local50-200ms2-5sEntity-focused retrieval
RAPTOR Collapsed50-150ms2-5sStandard vector top-k
RAPTOR Tree100-300ms2-6sMulti-level traversal
Naive RAG30-100ms2-4sSimple vector top-k

Key observations:
- GraphRAG Global Search is an outlier — it can be very slow because it processes all community summaries. On large corpora, this can take 30-60+ seconds.
- LightRAG and RAPTOR are comparable to naive RAG in retrieval speed.
- All systems are dominated by LLM generation time, not retrieval time, for typical corpus sizes.


6. PERSONAL KNOWLEDGE BASE ASSESSMENT

6.1 Suitability Matrix

FactorGraphRAGLightRAGRAPTOR
Small corpus (<1K docs)OverkillGoodBest
Medium corpus (1K-10K docs)GoodBestGood
Large corpus (10K-100K docs)BestGoodLimited
Frequently updatedPoorBestPoor
Budget-constrainedPoorGoodBest
Mixed content typesGoodGoodGood
Needs relational queriesBestVery GoodAdequate
Needs thematic summariesBestVery GoodVery Good
Ease of self-hostingMediumEasyEasy
Long-term maintainabilityBest (Microsoft)Good (active OSS)Poor (research code)

6.2 Recommendation by Use Case

For a personal Zettelkasten / notes system (growing, interlinked notes):
- LightRAG — Incremental indexing is essential. Graph structure captures note connections. Cost-effective.

For a personal research library (PDFs, papers, bookmarks):
- LightRAG or RAPTOR — RAPTOR if documents are long and you primarily need within-document synthesis. LightRAG if you need cross-document relationship discovery.

For an organizational knowledge base (team wiki, documentation):
- GraphRAG — The hierarchical community structure shines at scale. Cost is amortized across users. Microsoft backing provides production confidence.

For a book/course notes system (small corpus, deep questions):
- RAPTOR — Lowest cost, recursive summarization captures multi-level understanding of structured content.


7. KEY PAPERS & REFERENCES

  1. GraphRAG: Edge et al., “From Local to Global: A Graph RAG Approach to Query-Focused Summarization” (Microsoft Research, 2024). arXiv:2404.16130.
  2. LightRAG: Guo et al., “LightRAG: Simple and Fast Retrieval-Augmented Generation” (HKU, 2024). arXiv:2410.05779.
  3. RAPTOR: Sarthi et al., “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval” (Stanford, 2024). arXiv:2401.18059.
  4. HippoRAG: Gutierrez et al., “HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models” (2024). NeurIPS 2024.
  5. Graph-based RAG Survey: Peng et al., “Graph Retrieval-Augmented Generation: A Survey” (2024). arXiv:2408.08921.
  6. KG-RAG Benchmark: Various community benchmarks on GitHub comparing GraphRAG, LightRAG, nano-GraphRAG, and RAPTOR across standardized datasets (2025).

8. SUMMARY

DimensionWinnerRunner-Up
Answer Quality (thematic)GraphRAGLightRAG
Answer Quality (factual)Tie (all comparable)
Answer Quality (relational)GraphRAGLightRAG
Answer Quality (multi-hop)RAPTORLightRAG
Indexing CostRAPTORLightRAG
Query LatencyRAPTOR / LightRAG (tie)
Incremental UpdatesLightRAG (only native)
Implementation SimplicityRAPTORLightRAG
Production ReadinessGraphRAGLightRAG
Personal KB (overall)LightRAGRAPTOR

Bottom line for personal knowledge bases: LightRAG offers the best balance of answer quality, cost efficiency, incremental updates, and implementation simplicity. It achieves 85-95% of GraphRAG’s answer quality at 20-30% of the indexing cost, with the critical advantage of incremental updates. RAPTOR is the best choice for small, static corpora where long-document comprehension matters most. GraphRAG is justified when corpus scale exceeds 10K+ documents and relational/thematic query quality is paramount, or when organizational backing justifies the higher cost.

Get workflow automation insights that cut through the noise

One email per week. Practical frameworks, not product pitches.

Ready to Run Autonomous Enterprise Operations?

See how QorSync AI deploys governed agents across your enterprise systems.

Request Demo

Not ready for a demo? Start here instead:

Related Articles