Skip to content

Infinity

Multi-vector embeddings via an Infinity Embedding API server.

Overview

Field Value
Type API
Modality Text + Image
Provider Infinity (self-hosted)
Default Model michaelfeil/colqwen2-v0.1
Env Variable INFINITY_API_URL
GPU Required No (client-side)

Connects to a running Infinity server that serves ColPali/ColQwen2 models. The server handles GPU inference; this client only needs HTTP access. Produces multi-vector embeddings (one vector per token/patch) for MaxSim late interaction retrieval.

Uses the official infinity-client package (InfinityVisionAPI) which handles HTTP session management, retry with exponential backoff, base64 decoding, numpy array reshaping, and semaphore-based concurrency control internally.

Prerequisites

You need a running Infinity server. Start one with Docker:

docker run -it --gpus all \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  -p 7997:7997 \
  michaelf34/infinity:latest \
  v2 \
  --model-id michaelfeil/colqwen2-v0.1 \
  --port 7997

Configuration

_target_: autorag_research.embeddings.infinity.InfinityEmbeddings
model_name: michaelfeil/colqwen2-v0.1
url: ${oc.env:INFINITY_API_URL,http://localhost:7997}
encoding: base64
embed_batch_size: 10

Parameters

Parameter Type Default Description
url str http://localhost:7997 Infinity API server URL
model_name str michaelfeil/colqwen2-v0.1 Model name served by Infinity
encoding str base64 Response encoding: base64 or float
embed_batch_size int 10 Batch size for batch embedding methods

Supported Models

Any model supported by the Infinity server can be used. Common ColEncoder models:

Model Type Description
michaelfeil/colqwen2-v0.1 ColQwen2 Multi-modal (text + image), 128-dim
vidore/colpali-v1.3 ColPali Multi-modal (text + image), 128-dim
colbert-ir/colbertv2.0 ColBERT Text-only, 128-dim