AI Agent Memory Systems11 min read

Embedding Model Selection for Personal Knowledge Bases: Comprehensive Research Report

Note: There is no “ada-003” model. The lineage is: text-embedding-ada-002 (legacy, 1536d) -> text-embedding-3-small/large (current as of 2025). OpenAI has not released an...

Dhawal Chheda•AI Leader at Accel4•December 26, 2025•

Embedding Model Selection for Personal Knowledge Bases: Comprehensive Research Report

1. Model Overview & Specifications

OpenAI text-embedding-3 Family

Property	text-embedding-3-small	text-embedding-3-large
Max Dimensions	1536	3072
Adjustable Dims	Yes (e.g., 512)	Yes (e.g., 256, 1024)
Max Tokens	8191	8191
MTEB Average	~62.3	~64.6
Cost (per 1M tokens)	$0.02	$0.13
Multilingual	Moderate (trained on multilingual data, no explicit focus)
API Only	Yes	Yes

Note: There is no “ada-003” model. The lineage is: text-embedding-ada-002 (legacy, 1536d) -> text-embedding-3-small/large (current as of 2025). OpenAI has not released an “ada-003.”

Key strength: Matryoshka Representation Learning (MRL) allows dimension truncation with graceful quality degradation. You can store 256d vectors for cost-sensitive applications and still get reasonable retrieval.

Cohere Embed v4

Property	Value
Dimensions	1024 (default), adjustable
Max Tokens	512 tokens
MTEB Average	~66-67 (estimated)
Cost	~$0.10 per 1M tokens (API)
Multilingual	100+ languages (strongest multilingual of all commercial options)
Open Weights	No (API only)
Notable Feature	Input type parameter: `search_document`, `search_query`, `classification`, `clustering`

Cohere Embed v4 (released late 2024/early 2025) succeeded Embed v3 with improved compression and multilingual handling. It supports int8 and binary quantization natively, making it extremely storage-efficient. The input-type differentiation is particularly useful for conversational memory, where queries and stored memories have different structural characteristics.

Voyage AI (voyage-3 / voyage-3-large / voyage-code-3)

Property	voyage-3	voyage-3-large
Dimensions	1024	1024
Max Tokens	32,000	32,000
MTEB Average	~67.2	~68.0
Cost (per 1M tokens)	$0.06	$0.18
Multilingual	Good	Good
Notable	Long-context retrieval	Top retrieval benchmarks

Voyage AI was acquired by Anthropic in early 2025. Their models consistently rank at or near the top of MTEB leaderboards for retrieval tasks. The 32K token context window is a significant differentiator for conversational memory – you can embed entire conversation threads as single vectors rather than chunking them.

voyage-code-3 is specialized for code, irrelevant here.

BGE Family (BAAI)

Property	bge-large-en-v1.5	bge-m3	bge-en-icl
Dimensions	1024	1024	4096
Max Tokens	512	8192	32,768
MTEB Average	~63.6	~65+ (multilingual)	~67+
Cost	Free (open-source)	Free	Free
Multilingual	English only	100+ languages	English
License	MIT	MIT	MIT

BGE-M3 is notable for supporting dense, sparse (lexical), and ColBERT-style multi-vector retrieval in a single model. This hybrid approach is extremely effective for conversational memory because:
- Dense vectors capture semantic meaning
- Sparse vectors capture exact keyword matches (names, dates, specific facts)
- ColBERT captures fine-grained token-level interactions

bge-en-icl supports in-context learning for embeddings, where you provide few-shot examples to customize retrieval behavior at inference time – powerful for personalized knowledge bases.

Nomic Embed (Nomic AI)

Property	nomic-embed-text-v1	nomic-embed-text-v1.5
Dimensions	768	768 (MRL: 64-768)
Max Tokens	8192	8192
MTEB Average	~62.4	~62.3
Cost	Free (open-source) / API available	Free / API
Multilingual	English-focused	English-focused
License	Apache 2.0	Apache 2.0

Nomic Embed was the first fully open-source (code + weights + training data) embedding model to outperform OpenAI ada-002. V1.5 added Matryoshka support. Good balance of quality and efficiency, but lags behind top-tier models. The 8192 context window is adequate for most conversational chunks.

E5 Family (Microsoft)

Property	e5-large-v2	e5-mistral-7b-instruct	multilingual-e5-large-instruct
Dimensions	1024	4096	1024
Max Tokens	512	32,768	512
MTEB Average	~61.5	~66.6	~61.1 (multilingual)
Cost	Free (open-source)	Free	Free
Multilingual	English	English	100+ languages
License	MIT	MIT	MIT
Parameters	335M	7B	560M

e5-mistral-7b-instruct was a breakthrough: an LLM-based embedding model achieving near-SOTA results. However, at 7B parameters, inference cost is substantially higher than smaller encoder models. Requires GPU infrastructure.

Other Notable Models (2025)

Model	Dims	MTEB Avg	Notable
Jina Embeddings v3	1024 (MRL)	~65	8K context, task-specific LoRA adapters
GTE-Qwen2 (Alibaba)	1536-4096	~67+	LLM-based, strong multilingual
mxbai-embed-large (Mixedbread)	1024	~64.7	Binary quantization support
Snowflake Arctic Embed L	1024	~64.2	Open-source, retrieval-focused
NV-Embed-v2 (NVIDIA)	4096	~69+	Topped MTEB in late 2024, requires heavy GPU

2. Benchmark Comparison (MTEB / BEIR)

MTEB Overall Leaderboard (as of early 2025, retrieval-focused tasks)

Rank	Model	MTEB Avg (Retrieval)	Notes
1	NV-Embed-v2	~69.1	7B params, impractical for personal use
2	voyage-3-large	~68.0	API, excellent cost-performance
3	GTE-Qwen2-7B	~67.8	Open-source, needs GPU
4	Cohere Embed v4	~67.0	API, best multilingual
5	bge-en-icl	~67.0	Open-source, in-context learning
6	voyage-3	~67.2	API, best value commercial
7	e5-mistral-7b	~66.6	Open-source, 7B params
8	text-embedding-3-large	~64.6	API, cheapest at scale
9	Jina v3	~65.0	Open weights
10	Snowflake Arctic Embed L	~64.2	Open-source

BEIR Retrieval Benchmark (nDCG@10 averages)

BEIR is the most relevant benchmark for knowledge base retrieval. Key results:

Model	BEIR Avg nDCG@10	Best Domain
voyage-3-large	~57.5	Cross-domain retrieval
Cohere Embed v4	~56.8	Multilingual queries
bge-en-icl	~56.0	With few-shot examples
text-embedding-3-large	~54.8	General purpose
bge-m3 (hybrid)	~55.5	When using dense+sparse fusion

Important caveat: Standard benchmarks use formal queries against document corpora. Conversational memory retrieval involves informal, context-dependent queries (e.g., “what did I say about the Python project last week?”) which are structurally different from benchmark queries. Models that handle asymmetric retrieval (short informal query vs. longer stored passage) tend to perform better in practice than benchmarks suggest.

3. Analysis by Selection Criteria

A. Retrieval Accuracy for Conversational Memory

Conversational memory has unique requirements:
- Asymmetric queries: Short/informal queries against longer stored conversations
- Temporal context: “Last week”, “recently”, “that time when…”
- Entity-centric: “What did Sarah say about X?”
- Mixed formality: Casual language in queries, varied formality in stored content
- Context dependency: Queries often lack standalone meaning

Top picks for conversational memory:

Voyage-3 / Voyage-3-large – 32K context means you can embed entire conversation sessions as single vectors, preserving conversational flow. Anthropic’s acquisition signals continued investment.
Cohere Embed v4 – The search_query vs search_document input type distinction directly addresses asymmetric retrieval. The model learns different representations for queries vs. documents.
BGE-M3 (hybrid mode) – Dense+sparse fusion catches both semantic similarity AND exact entity matches. When someone asks “what did I discuss with John about Kubernetes?”, the sparse component catches “John” and “Kubernetes” while the dense component handles semantic matching.
bge-en-icl – In-context learning lets you provide examples of your retrieval pattern: “Given a query like X, the relevant memory is Y.” This personalization is uniquely powerful for individual knowledge bases.

B. Dimensionality & Storage

For a personal knowledge base, storage costs matter:

Model	Dims	Bytes/Vector (float32)	Vectors per GB
nomic-embed-text-v1.5 @128d	128	512	~2.1M
text-embedding-3-small @256d	256	1024	~1.05M
bge-m3 (dense)	1024	4096	~262K
text-embedding-3-large @1024d	1024	4096	~262K
voyage-3	1024	4096	~262K
NV-Embed-v2	4096	16384	~65K

Practical recommendation: For personal knowledge bases with <1M vectors, 1024d is fine. For multi-million vector collections, use MRL models (OpenAI, Nomic, Jina) truncated to 256-512d, or Cohere’s native binary quantization (128 bytes per 1024d vector).

C. Multilingual Support

Tier	Models	Languages	Quality
Tier 1	Cohere Embed v4, BGE-M3	100+	Excellent cross-lingual retrieval
Tier 2	multilingual-e5-large-instruct, Jina v3	100+	Good, but lower overall quality
Tier 3	text-embedding-3-large, voyage-3	Multilingual but English-optimized	Adequate for common languages
Tier 4	Nomic, Snowflake Arctic	English-focused	Poor for non-English

If the knowledge base is multilingual or you need cross-lingual retrieval (query in English, retrieve content in another language), Cohere Embed v4 or BGE-M3 are the clear choices.

D. Cost Per Token (API Models)

Model	Cost per 1M tokens	Cost for 1B tokens	Free Tier
text-embedding-3-small	$0.02	$20	No
text-embedding-3-large	$0.13	$130	No
voyage-3	$0.06	$60	200M tokens free
voyage-3-large	$0.18	$180	200M tokens free
Cohere Embed v4	~$0.10	~$100	1000 calls/month free
Open-source models	$0 (API)	$0 (+ infra cost)	N/A

Infrastructure cost for open-source models: Running BGE-M3 or bge-large-en-v1.5 on a cloud GPU (e.g., T4) costs roughly $0.20-0.50/hour. For periodic batch embedding of personal notes, this is negligible. For real-time embedding on every chat message, a persistent GPU instance may cost $150-400/month.

For personal knowledge bases with moderate volume (<10M tokens/month), API costs are trivial for any provider. The self-hosted open-source path only makes economic sense at scale or when you have privacy requirements.

4. Recommendations by Use Case

Primary Recommendation: Conversational AI Memory

Best overall: Voyage-3 ($0.06/1M tokens)

Rationale:
- Top-tier retrieval accuracy (MTEB ~67.2)
- 32K token context window handles full conversation threads
- Excellent asymmetric retrieval (queries vs. documents)
- Anthropic-backed, ensuring longevity and Claude ecosystem integration
- Reasonable cost
- Good enough multilingual support for most users

Best open-source: BGE-M3 (hybrid mode)

Rationale:
- Dense+sparse+ColBERT hybrid catches entities AND semantics
- 8K context, 100+ languages
- Fully self-hostable (privacy preservation)
- MIT license
- Competitive retrieval accuracy when using score fusion

Best budget: text-embedding-3-small at 512 dimensions

Rationale:
- $0.02/1M tokens (cheapest commercial option by far)
- MRL lets you reduce dimensions without retraining
- Good-enough quality for personal use
- Simplest API integration

Decision Matrix

Priority	Recommended Model	Why
Maximum retrieval quality	voyage-3-large	Highest BEIR scores among practical models
Multilingual knowledge base	Cohere Embed v4	Best cross-lingual retrieval
Privacy / self-hosted	BGE-M3 (hybrid)	Open-source, hybrid retrieval
Minimum cost (API)	text-embedding-3-small	$0.02/1M tokens
Long conversations as single vectors	voyage-3	32K token window
Personalized retrieval	bge-en-icl	In-context learning adapts to your patterns
Balanced all-around	voyage-3	Best cost-quality-features ratio

5. Architecture Recommendation for Conversational Memory

Beyond model selection, the retrieval architecture matters enormously:

Hybrid retrieval (dense + sparse) improves recall by 10-15% over dense-only for entity-rich conversational queries. Use BGE-M3 natively, or combine any dense model with BM25.
Two-stage retrieval: Use a cheaper/smaller model for initial recall (top-100), then a larger model or cross-encoder for reranking (top-10). Cohere Rerank v3 or a cross-encoder like bge-reranker-v2-m3 significantly improves precision.
Temporal weighting: Multiply similarity scores by a recency decay factor. Conversations from yesterday should rank higher than semantically-similar conversations from a year ago, all else being equal.
Metadata filtering: Store timestamps, conversation participants, and topics as metadata. Filter before vector search to narrow the candidate set.
Chunking strategy: For conversations, chunk by topic shift or turn-group (3-5 turns) rather than fixed token count. Preserve speaker attribution in chunk text.

6. Summary

For a personal conversational AI knowledge base in 2025-2026:

If using APIs: Voyage-3 offers the best balance of retrieval quality, context length, cost, and ecosystem alignment (Anthropic). Use Cohere Embed v4 if multilingual is critical.
If self-hosting: BGE-M3 in hybrid mode (dense+sparse fusion) with a reranker provides the best retrieval quality for free, with excellent multilingual support.
If cost-minimizing: text-embedding-3-small at 512 dimensions is remarkably capable for $0.02/1M tokens.
Avoid: NV-Embed-v2 and other 7B+ parameter models for personal use – the marginal quality gain does not justify the infrastructure cost.
Important: Pair any embedding model with a reranker and hybrid (dense+BM25) retrieval for best results on conversational memory tasks.

Get workflow automation insights that cut through the noise

One email per week. Practical frameworks, not product pitches.

Ready to Run Autonomous Enterprise Operations?

See how QorSync AI deploys governed agents across your enterprise systems.

Request Demo

Not ready for a demo? Start here instead:

Download the governance checklist Try the ROI calculator

AI Memory Digital Twins in 2026: Architecture, Governance, and Enterprise Risk

3 min read

Memory as a Service in 2026: Platform Comparison and Enterprise Buying Guide

4 min read

AI Agent Memory Security: Threat Model, Controls, and Incident Response Blueprint

4 min read

Embedding Model Selection for Personal Knowledge Bases: Comprehensive Research Report

1. Model Overview & Specifications

OpenAI text-embedding-3 Family

Cohere Embed v4

Voyage AI (voyage-3 / voyage-3-large / voyage-code-3)

BGE Family (BAAI)

Nomic Embed (Nomic AI)

E5 Family (Microsoft)

Other Notable Models (2025)

2. Benchmark Comparison (MTEB / BEIR)

MTEB Overall Leaderboard (as of early 2025, retrieval-focused tasks)

BEIR Retrieval Benchmark (nDCG@10 averages)

3. Analysis by Selection Criteria

A. Retrieval Accuracy for Conversational Memory

B. Dimensionality & Storage

C. Multilingual Support

D. Cost Per Token (API Models)

4. Recommendations by Use Case

Primary Recommendation: Conversational AI Memory

Decision Matrix

5. Architecture Recommendation for Conversational Memory

6. Summary

Get workflow automation insights that cut through the noise

Ready to Run Autonomous Enterprise Operations?

Related Articles