Infinity¶
Multi-vector embeddings via an Infinity Embedding API server.
Overview¶
| Field | Value |
|---|---|
| Type | API |
| Modality | Text + Image |
| Provider | Infinity (self-hosted) |
| Default Model | michaelfeil/colqwen2-v0.1 |
| Env Variable | INFINITY_API_URL |
| GPU Required | No (client-side) |
Connects to a running Infinity server that serves ColPali/ColQwen2 models. The server handles GPU inference; this client only needs HTTP access. Produces multi-vector embeddings (one vector per token/patch) for MaxSim late interaction retrieval.
Uses the official infinity-client package (InfinityVisionAPI) which handles HTTP session management, retry with exponential backoff, base64 decoding, numpy array reshaping, and semaphore-based concurrency control internally.
Prerequisites¶
You need a running Infinity server. Start one with Docker:
docker run -it --gpus all \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
-p 7997:7997 \
michaelf34/infinity:latest \
v2 \
--model-id michaelfeil/colqwen2-v0.1 \
--port 7997
Configuration¶
_target_: autorag_research.embeddings.infinity.InfinityEmbeddings
model_name: michaelfeil/colqwen2-v0.1
url: ${oc.env:INFINITY_API_URL,http://localhost:7997}
encoding: base64
embed_batch_size: 10
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | http://localhost:7997 |
Infinity API server URL |
model_name |
str | michaelfeil/colqwen2-v0.1 |
Model name served by Infinity |
encoding |
str | base64 |
Response encoding: base64 or float |
embed_batch_size |
int | 10 |
Batch size for batch embedding methods |
Supported Models¶
Any model supported by the Infinity server can be used. Common ColEncoder models:
| Model | Type | Description |
|---|---|---|
michaelfeil/colqwen2-v0.1 |
ColQwen2 | Multi-modal (text + image), 128-dim |
vidore/colpali-v1.3 |
ColPali | Multi-modal (text + image), 128-dim |
colbert-ir/colbertv2.0 |
ColBERT | Text-only, 128-dim |