Back to Blog

Embedding Model Selection for Personal Knowledge Bases: Comprehensive Research Report

Note: There is no “ada-003” model. The lineage is: text-embedding-ada-002 (legacy, 1536d) -> text-embedding-3-small/large (current as of 2025). OpenAI has not released an...

Dhawal ChhedaAI Leader at Accel4

Embedding Model Selection for Personal Knowledge Bases: Comprehensive Research Report


1. Model Overview & Specifications

OpenAI text-embedding-3 Family

Propertytext-embedding-3-smalltext-embedding-3-large
Max Dimensions15363072
Adjustable DimsYes (e.g., 512)Yes (e.g., 256, 1024)
Max Tokens81918191
MTEB Average~62.3~64.6
Cost (per 1M tokens)$0.02$0.13
MultilingualModerate (trained on multilingual data, no explicit focus)
API OnlyYesYes

Note: There is no “ada-003” model. The lineage is: text-embedding-ada-002 (legacy, 1536d) -> text-embedding-3-small/large (current as of 2025). OpenAI has not released an “ada-003.”

Key strength: Matryoshka Representation Learning (MRL) allows dimension truncation with graceful quality degradation. You can store 256d vectors for cost-sensitive applications and still get reasonable retrieval.

Cohere Embed v4

PropertyValue
Dimensions1024 (default), adjustable
Max Tokens512 tokens
MTEB Average~66-67 (estimated)
Cost~$0.10 per 1M tokens (API)
Multilingual100+ languages (strongest multilingual of all commercial options)
Open WeightsNo (API only)
Notable FeatureInput type parameter: search_document, search_query, classification, clustering

Cohere Embed v4 (released late 2024/early 2025) succeeded Embed v3 with improved compression and multilingual handling. It supports int8 and binary quantization natively, making it extremely storage-efficient. The input-type differentiation is particularly useful for conversational memory, where queries and stored memories have different structural characteristics.

Voyage AI (voyage-3 / voyage-3-large / voyage-code-3)

Propertyvoyage-3voyage-3-large
Dimensions10241024
Max Tokens32,00032,000
MTEB Average~67.2~68.0
Cost (per 1M tokens)$0.06$0.18
MultilingualGoodGood
NotableLong-context retrievalTop retrieval benchmarks

Voyage AI was acquired by Anthropic in early 2025. Their models consistently rank at or near the top of MTEB leaderboards for retrieval tasks. The 32K token context window is a significant differentiator for conversational memory – you can embed entire conversation threads as single vectors rather than chunking them.

voyage-code-3 is specialized for code, irrelevant here.

BGE Family (BAAI)

Propertybge-large-en-v1.5bge-m3bge-en-icl
Dimensions102410244096
Max Tokens512819232,768
MTEB Average~63.6~65+ (multilingual)~67+
CostFree (open-source)FreeFree
MultilingualEnglish only100+ languagesEnglish
LicenseMITMITMIT

BGE-M3 is notable for supporting dense, sparse (lexical), and ColBERT-style multi-vector retrieval in a single model. This hybrid approach is extremely effective for conversational memory because:
- Dense vectors capture semantic meaning
- Sparse vectors capture exact keyword matches (names, dates, specific facts)
- ColBERT captures fine-grained token-level interactions

bge-en-icl supports in-context learning for embeddings, where you provide few-shot examples to customize retrieval behavior at inference time – powerful for personalized knowledge bases.

Nomic Embed (Nomic AI)

Propertynomic-embed-text-v1nomic-embed-text-v1.5
Dimensions768768 (MRL: 64-768)
Max Tokens81928192
MTEB Average~62.4~62.3
CostFree (open-source) / API availableFree / API
MultilingualEnglish-focusedEnglish-focused
LicenseApache 2.0Apache 2.0

Nomic Embed was the first fully open-source (code + weights + training data) embedding model to outperform OpenAI ada-002. V1.5 added Matryoshka support. Good balance of quality and efficiency, but lags behind top-tier models. The 8192 context window is adequate for most conversational chunks.

E5 Family (Microsoft)

Propertye5-large-v2e5-mistral-7b-instructmultilingual-e5-large-instruct
Dimensions102440961024
Max Tokens51232,768512
MTEB Average~61.5~66.6~61.1 (multilingual)
CostFree (open-source)FreeFree
MultilingualEnglishEnglish100+ languages
LicenseMITMITMIT
Parameters335M7B560M

e5-mistral-7b-instruct was a breakthrough: an LLM-based embedding model achieving near-SOTA results. However, at 7B parameters, inference cost is substantially higher than smaller encoder models. Requires GPU infrastructure.

Other Notable Models (2025)

ModelDimsMTEB AvgNotable
Jina Embeddings v31024 (MRL)~658K context, task-specific LoRA adapters
GTE-Qwen2 (Alibaba)1536-4096~67+LLM-based, strong multilingual
mxbai-embed-large (Mixedbread)1024~64.7Binary quantization support
Snowflake Arctic Embed L1024~64.2Open-source, retrieval-focused
NV-Embed-v2 (NVIDIA)4096~69+Topped MTEB in late 2024, requires heavy GPU

2. Benchmark Comparison (MTEB / BEIR)

MTEB Overall Leaderboard (as of early 2025, retrieval-focused tasks)

RankModelMTEB Avg (Retrieval)Notes
1NV-Embed-v2~69.17B params, impractical for personal use
2voyage-3-large~68.0API, excellent cost-performance
3GTE-Qwen2-7B~67.8Open-source, needs GPU
4Cohere Embed v4~67.0API, best multilingual
5bge-en-icl~67.0Open-source, in-context learning
6voyage-3~67.2API, best value commercial
7e5-mistral-7b~66.6Open-source, 7B params
8text-embedding-3-large~64.6API, cheapest at scale
9Jina v3~65.0Open weights
10Snowflake Arctic Embed L~64.2Open-source

BEIR Retrieval Benchmark (nDCG@10 averages)

BEIR is the most relevant benchmark for knowledge base retrieval. Key results:

ModelBEIR Avg nDCG@10Best Domain
voyage-3-large~57.5Cross-domain retrieval
Cohere Embed v4~56.8Multilingual queries
bge-en-icl~56.0With few-shot examples
text-embedding-3-large~54.8General purpose
bge-m3 (hybrid)~55.5When using dense+sparse fusion

Important caveat: Standard benchmarks use formal queries against document corpora. Conversational memory retrieval involves informal, context-dependent queries (e.g., “what did I say about the Python project last week?”) which are structurally different from benchmark queries. Models that handle asymmetric retrieval (short informal query vs. longer stored passage) tend to perform better in practice than benchmarks suggest.


3. Analysis by Selection Criteria

A. Retrieval Accuracy for Conversational Memory

Conversational memory has unique requirements:
- Asymmetric queries: Short/informal queries against longer stored conversations
- Temporal context: “Last week”, “recently”, “that time when…”
- Entity-centric: “What did Sarah say about X?”
- Mixed formality: Casual language in queries, varied formality in stored content
- Context dependency: Queries often lack standalone meaning

Top picks for conversational memory:

  1. Voyage-3 / Voyage-3-large – 32K context means you can embed entire conversation sessions as single vectors, preserving conversational flow. Anthropic’s acquisition signals continued investment.

  2. Cohere Embed v4 – The search_query vs search_document input type distinction directly addresses asymmetric retrieval. The model learns different representations for queries vs. documents.

  3. BGE-M3 (hybrid mode) – Dense+sparse fusion catches both semantic similarity AND exact entity matches. When someone asks “what did I discuss with John about Kubernetes?”, the sparse component catches “John” and “Kubernetes” while the dense component handles semantic matching.

  4. bge-en-icl – In-context learning lets you provide examples of your retrieval pattern: “Given a query like X, the relevant memory is Y.” This personalization is uniquely powerful for individual knowledge bases.

B. Dimensionality & Storage

For a personal knowledge base, storage costs matter:

ModelDimsBytes/Vector (float32)Vectors per GB
nomic-embed-text-v1.5 @128d128512~2.1M
text-embedding-3-small @256d2561024~1.05M
bge-m3 (dense)10244096~262K
text-embedding-3-large @1024d10244096~262K
voyage-310244096~262K
NV-Embed-v2409616384~65K

Practical recommendation: For personal knowledge bases with <1M vectors, 1024d is fine. For multi-million vector collections, use MRL models (OpenAI, Nomic, Jina) truncated to 256-512d, or Cohere’s native binary quantization (128 bytes per 1024d vector).

C. Multilingual Support

TierModelsLanguagesQuality
Tier 1Cohere Embed v4, BGE-M3100+Excellent cross-lingual retrieval
Tier 2multilingual-e5-large-instruct, Jina v3100+Good, but lower overall quality
Tier 3text-embedding-3-large, voyage-3Multilingual but English-optimizedAdequate for common languages
Tier 4Nomic, Snowflake ArcticEnglish-focusedPoor for non-English

If the knowledge base is multilingual or you need cross-lingual retrieval (query in English, retrieve content in another language), Cohere Embed v4 or BGE-M3 are the clear choices.

D. Cost Per Token (API Models)

ModelCost per 1M tokensCost for 1B tokensFree Tier
text-embedding-3-small$0.02$20No
text-embedding-3-large$0.13$130No
voyage-3$0.06$60200M tokens free
voyage-3-large$0.18$180200M tokens free
Cohere Embed v4~$0.10~$1001000 calls/month free
Open-source models$0 (API)$0 (+ infra cost)N/A

Infrastructure cost for open-source models: Running BGE-M3 or bge-large-en-v1.5 on a cloud GPU (e.g., T4) costs roughly $0.20-0.50/hour. For periodic batch embedding of personal notes, this is negligible. For real-time embedding on every chat message, a persistent GPU instance may cost $150-400/month.

For personal knowledge bases with moderate volume (<10M tokens/month), API costs are trivial for any provider. The self-hosted open-source path only makes economic sense at scale or when you have privacy requirements.


4. Recommendations by Use Case

Primary Recommendation: Conversational AI Memory

Best overall: Voyage-3 ($0.06/1M tokens)

Rationale:
- Top-tier retrieval accuracy (MTEB ~67.2)
- 32K token context window handles full conversation threads
- Excellent asymmetric retrieval (queries vs. documents)
- Anthropic-backed, ensuring longevity and Claude ecosystem integration
- Reasonable cost
- Good enough multilingual support for most users

Best open-source: BGE-M3 (hybrid mode)

Rationale:
- Dense+sparse+ColBERT hybrid catches entities AND semantics
- 8K context, 100+ languages
- Fully self-hostable (privacy preservation)
- MIT license
- Competitive retrieval accuracy when using score fusion

Best budget: text-embedding-3-small at 512 dimensions

Rationale:
- $0.02/1M tokens (cheapest commercial option by far)
- MRL lets you reduce dimensions without retraining
- Good-enough quality for personal use
- Simplest API integration

Decision Matrix

PriorityRecommended ModelWhy
Maximum retrieval qualityvoyage-3-largeHighest BEIR scores among practical models
Multilingual knowledge baseCohere Embed v4Best cross-lingual retrieval
Privacy / self-hostedBGE-M3 (hybrid)Open-source, hybrid retrieval
Minimum cost (API)text-embedding-3-small$0.02/1M tokens
Long conversations as single vectorsvoyage-332K token window
Personalized retrievalbge-en-iclIn-context learning adapts to your patterns
Balanced all-aroundvoyage-3Best cost-quality-features ratio

5. Architecture Recommendation for Conversational Memory

Beyond model selection, the retrieval architecture matters enormously:

  1. Hybrid retrieval (dense + sparse) improves recall by 10-15% over dense-only for entity-rich conversational queries. Use BGE-M3 natively, or combine any dense model with BM25.

  2. Two-stage retrieval: Use a cheaper/smaller model for initial recall (top-100), then a larger model or cross-encoder for reranking (top-10). Cohere Rerank v3 or a cross-encoder like bge-reranker-v2-m3 significantly improves precision.

  3. Temporal weighting: Multiply similarity scores by a recency decay factor. Conversations from yesterday should rank higher than semantically-similar conversations from a year ago, all else being equal.

  4. Metadata filtering: Store timestamps, conversation participants, and topics as metadata. Filter before vector search to narrow the candidate set.

  5. Chunking strategy: For conversations, chunk by topic shift or turn-group (3-5 turns) rather than fixed token count. Preserve speaker attribution in chunk text.


6. Summary

For a personal conversational AI knowledge base in 2025-2026:

  • If using APIs: Voyage-3 offers the best balance of retrieval quality, context length, cost, and ecosystem alignment (Anthropic). Use Cohere Embed v4 if multilingual is critical.
  • If self-hosting: BGE-M3 in hybrid mode (dense+sparse fusion) with a reranker provides the best retrieval quality for free, with excellent multilingual support.
  • If cost-minimizing: text-embedding-3-small at 512 dimensions is remarkably capable for $0.02/1M tokens.
  • Avoid: NV-Embed-v2 and other 7B+ parameter models for personal use – the marginal quality gain does not justify the infrastructure cost.
  • Important: Pair any embedding model with a reranker and hybrid (dense+BM25) retrieval for best results on conversational memory tasks.

Get workflow automation insights that cut through the noise

One email per week. Practical frameworks, not product pitches.

Ready to Run Autonomous Enterprise Operations?

See how QorSync AI deploys governed agents across your enterprise systems.

Request Demo

Not ready for a demo? Start here instead:

Related Articles