LLMAtlas — The Open Ecosystem Workspace for LLMs

RAG runs on two pieces of infra: an embedding model and a vector database. Both have meaningful trade-offs.

Embedding models in 2026

The current leaders: OpenAI text-embedding-3-large (3072 dim, $0.13/1M), OpenAI 3-small (1536 dim, $0.02/1M, 5× cheaper), Cohere embed-v4 (multilingual + multimodal), BGE bge-large-en-v1.5 (free, open, top open-weight English), Nomic nomic-embed-v1.5 (free, Matryoshka), Voyage voyage-3-large (strong on technical/code).

Dimension trade-off: Bigger vectors capture more nuance but cost more to store and search. Many models support Matryoshka truncation — use the first 512 dims of a 1024-dim vector with only modest quality loss.

Vector databases

A vector DB does one job: given a query vector, find the k nearest neighbours from millions of stored vectors, fast.

pgvector (Postgres extension) — Start here. Add a column, get embeddings + filters + transactions in one DB. Scales to ~10M vectors.
Qdrant / Weaviate / Milvus — Dedicated open-source vector DBs. Better recall, richer filtering, scales to billions.
Pinecone / Turbopuffer — Hosted SaaS. Zero ops, pay per query.
Chroma / LanceDB — Lightweight, embeddable. Good for local apps.
Elasticsearch / OpenSearch — Add vector search to keyword pipelines. Best for hybrid.

Approximate vs exact nearest-neighbour

Exact NN search scales poorly. Production uses ANN (approximate nearest neighbour) with HNSW or IVF indices — 95%+ recall at 100× the speed. Trade-off knob: index build time + memory vs query speed + recall.

Filtering — the often-overlooked half

In production, you rarely want pure semantic search. You want filtered semantic search: "Find the 5 most relevant chunks from this user's account, in the last 90 days, excluding archived." Vector DBs that pre-filter (not post-filter) win. Post-filtering after retrieval can leave you with 0 results.

Practical recipe

A solid starting stack:

Embedding: OpenAI text-embedding-3-small OR bge-large-en
Vector DB: Postgres + pgvector
Index: HNSW with cosine distance
Filtering: pre-filter by tenant_id, document_type, updated_at
Retrieval: top-30 → rerank → top-5

This serves 90% of production RAG.

Embedding Models & Vector Databases

Embedding models in 2026

Vector databases

Approximate vs exact nearest-neighbour

Filtering — the often-overlooked half

Practical recipe

Knowledge Check