Vector Search¶

Dense retrieval based on vector similarity.

Overview¶

Field	Value
Type	Retrieval
Algorithm	Dense vector similarity
Modality	Text
Extension	VectorChord

How It Works¶

Ranks documents by computing similarity between query and document embeddings.

Supports two search modes:

Single-vector: Standard dense retrieval with one embedding per document
Multi-vector: Late interaction (MaxSim) with multiple token-level embeddings

Uses VectorChord PostgreSQL extension for efficient vector search.

Score Metric¶

All scores are relevance scores in the range [-1, 1] where higher values indicate greater relevance.

Single-Vector Mode (Cosine Similarity)¶

Uses cosine similarity between query and document embeddings:

\[ \text{score} = 1 - \text{cosine\_distance} = \cos(\theta) = \frac{\mathbf{q} \cdot \mathbf{d}}{\|\mathbf{q}\| \|\mathbf{d}\|} \]

Where:

\(\mathbf{q}\) is the query embedding vector
\(\mathbf{d}\) is the document embedding vector

Multi-Vector Mode (Normalized Late Interaction)¶

Uses MaxSim operation normalized by the number of query vectors:

\[ \text{score} = \frac{1}{n} \sum_{i=1}^{n} \max_{j} (\mathbf{q}_i \cdot \mathbf{d}_j) \]

Where:

\(n\) is the number of query token vectors
\(\mathbf{q}_i\) is the \(i\)-th query token embedding
\(\mathbf{d}_j\) is the \(j\)-th document token embedding

This normalization ensures scores remain in [-1, 1] regardless of query length, making them comparable across different queries and compatible with single-vector scores for hybrid search.

Configuration¶

_target_: autorag_research.pipelines.retrieval.vector_search.VectorSearchPipelineConfig
name: vector_search
search_mode: single
top_k: 10
batch_size: 100

Options¶

Option	Type	Default	Description
name	str	required	Unique pipeline instance name
search_mode	str	`single`	Embedding mode (`single` or `multi`)
top_k	int	10	Results per query
batch_size	int	100	Queries per batch

Search Modes¶

Mode	Embedding Field	Algorithm	Use Case
single	`query.embedding`	Cosine similarity	Standard dense retrieval
multi	`query.embeddings`	MaxSim (late interaction)	Fine-grained token matching

Prerequisites¶

Queries must have pre-computed embeddings before running the pipeline:

from autorag_research.data_ingestor import DataIngestor

ingestor = DataIngestor(session_factory)
ingestor.embed_all(embedding_model)  # Populates embedding/embeddings fields

When to Use¶

Good for:

Semantic similarity search
Paraphrase and synonym matching
Cross-lingual retrieval (with multilingual embeddings)
Fine-grained matching (multi-vector mode)

Consider BM25 for:

Exact keyword matching
Low latency requirements
No embedding model available