Infinity¶

Multi-vector embeddings via an Infinity Embedding API server.

Overview¶

Field	Value
Type	API
Modality	Text + Image
Provider	Infinity (self-hosted)
Default Model	`michaelfeil/colqwen2-v0.1`
Env Variable	`INFINITY_API_URL`
GPU Required	No (client-side)

Connects to a running Infinity server that serves ColPali/ColQwen2 models. The server handles GPU inference; this client only needs HTTP access. Produces multi-vector embeddings (one vector per token/patch) for MaxSim late interaction retrieval.

Uses the official infinity-client package (InfinityVisionAPI) which handles HTTP session management, retry with exponential backoff, base64 decoding, numpy array reshaping, and semaphore-based concurrency control internally.

Prerequisites¶

You need a running Infinity server. Start one with Docker:

docker run -it --gpus all \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  -p 7997:7997 \
  michaelf34/infinity:latest \
  v2 \
  --model-id michaelfeil/colqwen2-v0.1 \
  --port 7997

Configuration¶

_target_: autorag_research.embeddings.infinity.InfinityEmbeddings
model_name: michaelfeil/colqwen2-v0.1
url: ${oc.env:INFINITY_API_URL,http://localhost:7997}
encoding: base64
embed_batch_size: 10

Parameters¶

Parameter	Type	Default	Description
`url`	str	`http://localhost:7997`	Infinity API server URL
`model_name`	str	`michaelfeil/colqwen2-v0.1`	Model name served by Infinity
`encoding`	str	`base64`	Response encoding: `base64` or `float`
`embed_batch_size`	int	`10`	Batch size for batch embedding methods

Supported Models¶

Any model supported by the Infinity server can be used. Common ColEncoder models:

Model	Type	Description
`michaelfeil/colqwen2-v0.1`	ColQwen2	Multi-modal (text + image), 128-dim
`vidore/colpali-v1.3`	ColPali	Multi-modal (text + image), 128-dim
`colbert-ir/colbertv2.0`	ColBERT	Text-only, 128-dim