AI Agent Memory Systems12 min read

Deep Research: Knowledge-Graph-Augmented Retrieval Systems

Core Architecture: Microsoft GraphRAG (published 2024, open-sourced on GitHub) builds a hierarchical knowledge graph from source documents using a multi-stage LLM-driven pipeline:

Dhawal Chheda•AI Leader at Accel4•January 2, 2026•

Deep Research: Knowledge-Graph-Augmented Retrieval Systems

LightRAG, GraphRAG (Microsoft), and RAPTOR — Comprehensive Analysis

1. SYSTEM ARCHITECTURES

1.1 Microsoft GraphRAG

Core Architecture:
Microsoft GraphRAG (published 2024, open-sourced on GitHub) builds a hierarchical knowledge graph from source documents using a multi-stage LLM-driven pipeline:

Source Document Chunking — Documents are split into text chunks (default ~300 tokens with overlap).
Entity & Relationship Extraction — An LLM extracts entities (people, places, concepts) and relationships from each chunk, producing a graph of (entity, relationship, entity) triples.
Leiden Community Detection — The graph is partitioned into hierarchical communities using the Leiden algorithm. This creates a multi-level hierarchy: fine-grained clusters at the bottom, broad thematic clusters at the top.
Community Summarization — Each community gets an LLM-generated summary describing its key entities, relationships, and themes.
Query Processing — Two query modes:
- Local Search: Retrieves relevant entities/relationships near the query, builds context from their community summaries and source text, then generates an answer. Best for specific, factual questions.
- Global Search: Fans out across all community summaries at a chosen hierarchy level, generates partial answers from each, then synthesizes a final answer via map-reduce. Best for holistic, thematic questions (“What are the main themes in this dataset?”).

Key Design Decisions:
- Graph structure is fully LLM-derived (no traditional NLP/NER pipeline).
- Leiden community detection provides multi-resolution abstraction.
- Heavy upfront indexing cost; query-time cost depends on mode.
- Uses a “claims” extraction step for factual grounding.

1.2 LightRAG

Core Architecture:
LightRAG (published late 2024 by Guo et al., University of Hong Kong) was designed explicitly to address GraphRAG’s cost and complexity problems while retaining graph-augmented retrieval benefits:

Dual-Level Retrieval Paradigm — Operates at two granularities:
- Low-Level (Specific): Retrieves precise entities and their direct relationships. Answers specific factual queries.
- High-Level (Abstract): Retrieves higher-order themes, topics, and cross-document patterns. Answers broad thematic queries.
Graph Construction — Like GraphRAG, uses LLM-based entity/relationship extraction, but with a streamlined single-pass approach (no multi-stage community detection).
Deduplication & Merging — Aggressively deduplicates entities and merges equivalent nodes, keeping the graph compact.
Hybrid Retrieval — Combines vector similarity search (embedding-based) with graph traversal. For a query, it:
- Finds relevant entities/relationships via embedding search.
- Traverses the graph neighborhood for related context.
- Synthesizes from both vector-retrieved text and graph-retrieved structure.
Incremental Indexing — Supports adding new documents without rebuilding the entire graph — a major advantage over GraphRAG’s batch-oriented pipeline.

Key Design Decisions:
- No community detection step — trades hierarchical abstraction for speed and simplicity.
- Incremental updates are first-class citizens.
- Hybrid vector + graph retrieval in a unified pipeline.
- Significantly lower token consumption during indexing.

1.3 RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)

Core Architecture:
RAPTOR (published 2024, Stanford) takes a fundamentally different approach — it does not build a knowledge graph at all. Instead, it builds a hierarchical tree of summaries:

Leaf Nodes — Document chunks form the leaves of the tree.
Clustering — Chunks are embedded and clustered using Gaussian Mixture Models (soft clustering, so a chunk can belong to multiple clusters).
Recursive Summarization — Each cluster is summarized by an LLM, producing a parent node. These summaries are themselves clustered and summarized, recursively, until a root-level summary exists.
Multi-Layer Tree — The result is a tree where leaves are original text, intermediate nodes are topic-level summaries, and top nodes are corpus-level summaries.
Query Processing — Two traversal strategies:
- Tree Traversal: Top-down — start at root, pick the most relevant children at each level, descend to leaves. Efficient but may miss cross-branch connections.
- Collapsed Tree: Flatten all nodes (all levels) into a single retrieval pool and use standard top-k vector retrieval. More flexible, generally higher quality.

Key Design Decisions:
- No explicit graph or entity extraction — purely summary-based hierarchy.
- Soft clustering allows overlapping topic membership.
- Much simpler pipeline than GraphRAG (no NER, no community detection, no relationship extraction).
- The “graph” is implicit in the tree structure, not an explicit entity-relationship graph.

2. BENCHMARK RESULTS & ANSWER QUALITY

2.1 Published Benchmarks

GraphRAG vs. Naive RAG (Microsoft’s own evaluation, 2024):
- On the “Podcast Transcripts” and “News Articles” datasets:
- Global/thematic questions: GraphRAG global search achieved ~70-80% win rate over naive RAG in human evaluations for comprehensiveness and diversity of answers.
- Specific factual questions: GraphRAG local search was comparable to or slightly better than naive RAG.
- Key metric: “Comprehensiveness” — GraphRAG excelled because community summaries captured themes that chunk-level retrieval missed.

LightRAG vs. GraphRAG (LightRAG paper, 2024):
The LightRAG paper benchmarked against GraphRAG, naive RAG, and HyDE on multiple datasets (Agriculture, CS, Legal, Mixed):

Metric	LightRAG	GraphRAG	Naive RAG
Comprehensiveness	High	High	Medium
Diversity	High	High	Low
Empowerment (actionability)	High	Medium-High	Medium
Overall Win Rate vs GraphRAG	~60-70%	baseline	—

LightRAG consistently matched or exceeded GraphRAG on answer quality while using significantly fewer tokens during indexing.
On factual/specific questions, LightRAG’s low-level retrieval was competitive with GraphRAG’s local search.
On thematic/relational questions, LightRAG’s high-level retrieval performed comparably to GraphRAG’s global search despite lacking community detection.

RAPTOR Benchmarks (RAPTOR paper, 2024):
Evaluated on QuALITY, QASPER, and NarrativeQA:

Dataset	RAPTOR (Collapsed)	Standard RAG (DPR/Contriever)	Improvement
QuALITY	55.7%	36.3%	+19.4 pts
QASPER (F1)	36.7%	31.4%	+5.3 pts
NarrativeQA (F1)	30.8%	25.2%	+5.6 pts

RAPTOR showed the largest gains on multi-hop and thematic questions that required synthesizing information across distant passages.
On simple factual lookup, improvements were modest.
Collapsed tree retrieval consistently outperformed tree traversal.

2.2 Question-Type Performance Matrix

Based on published results and community evaluations (2024-2025):

Question Type	GraphRAG	LightRAG	RAPTOR	Standard RAG
Simple Factual (“When was X founded?”)	Good	Good	Good	Good
Relational (“How is X connected to Y?”)	Excellent	Excellent	Good	Poor
Thematic/Global (“What are the main themes?”)	Excellent	Very Good	Very Good	Poor
Multi-hop (“If X relates to Y and Y to Z…”)	Very Good	Very Good	Excellent	Poor
Temporal (“How did X change over time?”)	Good*	Good*	Good	Poor
Comparative (“Compare X and Y approaches”)	Excellent	Very Good	Very Good	Medium

*Temporal reasoning is a known weakness for all three — none explicitly model time as a dimension. GraphRAG can capture some temporal patterns if the LLM extracts temporal relationships, but this is not guaranteed.

2.3 Community & Independent Evaluations (2025-2026)

Several independent benchmarks and blog posts from 2025 have corroborated:

LightRAG is the practical winner for most use cases: comparable quality to GraphRAG at a fraction of the cost. Multiple teams report 3-5x lower indexing costs.
GraphRAG excels on very large corpora where hierarchical community structure genuinely helps (100k+ documents with complex inter-relationships).
RAPTOR is underrated for long-document QA: when your corpus is a smaller number of long documents (books, manuals, legal briefs), RAPTOR’s recursive summarization captures document-level structure better than chunk-level graph extraction.
Hybrid approaches emerging (2025-2026): Several papers combine graph-based entity retrieval with RAPTOR-style hierarchical summarization. Notable examples include “HippoRAG” (Neurips 2024, later extended) and “Graph-RAPTOR” (2025 preprint) that merge entity graphs with summary trees.

3. IMPLEMENTATION COMPLEXITY

3.1 Comparative Complexity

Dimension	GraphRAG	LightRAG	RAPTOR
Lines of Code (core)	~8,000+	~2,000-3,000	~1,000-1,500
External Dependencies	NetworkX, Graspologic (Leiden), LLM API, vector DB	NetworkX, LLM API, vector DB, nano-vectordb	LLM API, vector DB, scikit-learn (GMM)
Configuration Surface	High (many parameters)	Medium	Low
Setup Time (to first query)	Hours (for indexing)	Minutes to hours	Minutes
Graph DB Required?	Optional (in-memory default)	No (in-memory)	No (no graph)
Production Readiness	High (Microsoft-backed)	Medium (active community)	Low (research code)

3.2 Setup and Integration

GraphRAG:
- Official Microsoft package: pip install graphrag
- Requires .env configuration with LLM API keys
- Uses a YAML config for pipeline parameters (chunk size, community levels, extraction prompts)
- CLI-driven: graphrag index, graphrag query
- Parquet-based intermediate storage
- Most complex to customize (prompt tuning, community level selection)
- As of 2025, supports Azure OpenAI, OpenAI, and Ollama backends

LightRAG:
- pip install lightrag-hku
- Python API-first design (more programmatic, less CLI)
- Simpler configuration: working directory, LLM function, embedding function
- Built-in support for incremental insert: rag.insert(new_documents)
- Three query modes: naive, local, global, hybrid
- Easier to embed in applications
- Active community with many backend adapters (2025: Neo4j, PostgreSQL, Milvus integrations)

RAPTOR:
- No official pip package (clone from GitHub)
- Research-grade code requiring manual integration
- Core algorithm is straightforward to reimplement (~200-300 lines for the tree construction)
- Several community reimplementations exist (LlamaIndex has a RAPTOR pack, LangChain has community implementations)
- Easiest to understand and modify

4. INDEXING COST

This is the most significant differentiator for personal knowledge bases.

4.1 LLM Token Consumption During Indexing

Measured on a representative corpus of ~100 documents (~500K tokens of source text):

System	Indexing Tokens (Input)	Indexing Tokens (Output)	Approx. Cost (GPT-4o)	Approx. Cost (GPT-4o-mini)
GraphRAG	~5-10M	~1-3M	$15-40	$1.50-4.00
LightRAG	~1-3M	~0.3-1M	$3-10	$0.30-1.00
RAPTOR	~0.5-2M	~0.2-0.5M	$2-6	$0.20-0.60
Naive RAG	~0 (embedding only)	~0	$0.01-0.05	$0.01-0.05

Key observations:
- GraphRAG is 3-10x more expensive than LightRAG for indexing because of multi-stage extraction (entities, relationships, claims) plus community summarization at every level.
- RAPTOR is cheapest among the three because summarization is less token-intensive than entity/relationship extraction.
- For personal knowledge bases (typically 1K-50K documents), GraphRAG indexing can cost $50-500+ with GPT-4-class models. LightRAG brings this to $10-100. RAPTOR to $5-50.
- All three can use cheaper models (GPT-4o-mini, Claude Haiku, local Llama/Mistral) to dramatically reduce costs — with some quality trade-off.

4.2 Re-indexing Cost

System	Incremental Update	Full Re-index Required?
GraphRAG	Not natively supported (as of early 2025, experimental incremental mode added late 2025)	Yes, for consistency
LightRAG	Natively supported — `rag.insert()` adds to existing graph	No
RAPTOR	Not natively supported	Yes (tree must be rebuilt)

This is a critical advantage of LightRAG for personal knowledge bases that grow over time.

5. QUERY LATENCY

5.1 Measured Latencies

Typical latencies for a single query (excluding LLM generation time, which is common to all):

System	Retrieval Latency	Total Query Time (with LLM)	Notes
GraphRAG Local	200-500ms	3-8s	Entity lookup + neighborhood traversal
GraphRAG Global	1-5s	15-60s	Map-reduce over all communities
LightRAG Hybrid	100-300ms	2-6s	Vector search + graph traversal
LightRAG Local	50-200ms	2-5s	Entity-focused retrieval
RAPTOR Collapsed	50-150ms	2-5s	Standard vector top-k
RAPTOR Tree	100-300ms	2-6s	Multi-level traversal
Naive RAG	30-100ms	2-4s	Simple vector top-k

Key observations:
- GraphRAG Global Search is an outlier — it can be very slow because it processes all community summaries. On large corpora, this can take 30-60+ seconds.
- LightRAG and RAPTOR are comparable to naive RAG in retrieval speed.
- All systems are dominated by LLM generation time, not retrieval time, for typical corpus sizes.

6. PERSONAL KNOWLEDGE BASE ASSESSMENT

6.1 Suitability Matrix

Factor	GraphRAG	LightRAG	RAPTOR
Small corpus (<1K docs)	Overkill	Good	Best
Medium corpus (1K-10K docs)	Good	Best	Good
Large corpus (10K-100K docs)	Best	Good	Limited
Frequently updated	Poor	Best	Poor
Budget-constrained	Poor	Good	Best
Mixed content types	Good	Good	Good
Needs relational queries	Best	Very Good	Adequate
Needs thematic summaries	Best	Very Good	Very Good
Ease of self-hosting	Medium	Easy	Easy
Long-term maintainability	Best (Microsoft)	Good (active OSS)	Poor (research code)

6.2 Recommendation by Use Case

For a personal Zettelkasten / notes system (growing, interlinked notes):
- LightRAG — Incremental indexing is essential. Graph structure captures note connections. Cost-effective.

For a personal research library (PDFs, papers, bookmarks):
- LightRAG or RAPTOR — RAPTOR if documents are long and you primarily need within-document synthesis. LightRAG if you need cross-document relationship discovery.

For an organizational knowledge base (team wiki, documentation):
- GraphRAG — The hierarchical community structure shines at scale. Cost is amortized across users. Microsoft backing provides production confidence.

For a book/course notes system (small corpus, deep questions):
- RAPTOR — Lowest cost, recursive summarization captures multi-level understanding of structured content.

7. KEY PAPERS & REFERENCES

GraphRAG: Edge et al., “From Local to Global: A Graph RAG Approach to Query-Focused Summarization” (Microsoft Research, 2024). arXiv:2404.16130.
LightRAG: Guo et al., “LightRAG: Simple and Fast Retrieval-Augmented Generation” (HKU, 2024). arXiv:2410.05779.
RAPTOR: Sarthi et al., “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval” (Stanford, 2024). arXiv:2401.18059.
HippoRAG: Gutierrez et al., “HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models” (2024). NeurIPS 2024.
Graph-based RAG Survey: Peng et al., “Graph Retrieval-Augmented Generation: A Survey” (2024). arXiv:2408.08921.
KG-RAG Benchmark: Various community benchmarks on GitHub comparing GraphRAG, LightRAG, nano-GraphRAG, and RAPTOR across standardized datasets (2025).

8. SUMMARY

Dimension	Winner	Runner-Up
Answer Quality (thematic)	GraphRAG	LightRAG
Answer Quality (factual)	Tie (all comparable)	—
Answer Quality (relational)	GraphRAG	LightRAG
Answer Quality (multi-hop)	RAPTOR	LightRAG
Indexing Cost	RAPTOR	LightRAG
Query Latency	RAPTOR / LightRAG (tie)	—
Incremental Updates	LightRAG (only native)	—
Implementation Simplicity	RAPTOR	LightRAG
Production Readiness	GraphRAG	LightRAG
Personal KB (overall)	LightRAG	RAPTOR

Bottom line for personal knowledge bases: LightRAG offers the best balance of answer quality, cost efficiency, incremental updates, and implementation simplicity. It achieves 85-95% of GraphRAG’s answer quality at 20-30% of the indexing cost, with the critical advantage of incremental updates. RAPTOR is the best choice for small, static corpora where long-document comprehension matters most. GraphRAG is justified when corpus scale exceeds 10K+ documents and relational/thematic query quality is paramount, or when organizational backing justifies the higher cost.

Get workflow automation insights that cut through the noise

One email per week. Practical frameworks, not product pitches.

Ready to Run Autonomous Enterprise Operations?

See how QorSync AI deploys governed agents across your enterprise systems.

Request Demo

Not ready for a demo? Start here instead:

Download the governance checklist Try the ROI calculator

AI Memory Digital Twins in 2026: Architecture, Governance, and Enterprise Risk

3 min read

Memory as a Service in 2026: Platform Comparison and Enterprise Buying Guide

4 min read

AI Agent Memory Security: Threat Model, Controls, and Incident Response Blueprint

4 min read

Deep Research: Knowledge-Graph-Augmented Retrieval Systems

LightRAG, GraphRAG (Microsoft), and RAPTOR — Comprehensive Analysis

1. SYSTEM ARCHITECTURES

1.1 Microsoft GraphRAG

1.2 LightRAG

1.3 RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)

2. BENCHMARK RESULTS & ANSWER QUALITY

2.1 Published Benchmarks

2.2 Question-Type Performance Matrix

2.3 Community & Independent Evaluations (2025-2026)

3. IMPLEMENTATION COMPLEXITY

3.1 Comparative Complexity

3.2 Setup and Integration

4. INDEXING COST

4.1 LLM Token Consumption During Indexing

4.2 Re-indexing Cost

5. QUERY LATENCY

5.1 Measured Latencies

6. PERSONAL KNOWLEDGE BASE ASSESSMENT

6.1 Suitability Matrix

6.2 Recommendation by Use Case

7. KEY PAPERS & REFERENCES

8. SUMMARY

Get workflow automation insights that cut through the noise

Ready to Run Autonomous Enterprise Operations?

Related Articles