Multimodal Datasets¶
Visual document benchmarks for image-based retrieval.
Available Datasets¶
| Dataset | Description |
|---|---|
| ViDoRe | Visual Document Retrieval |
| ViDoRe v2 | Visual Document Retrieval v2 |
| ViDoRe v3 | Visual Document Retrieval v3 |
| VisRAG | Visual RAG benchmark |
| Open-RAGBench | Open RAG benchmark from arXiv PDFs |
Key Characteristics¶
- Documents are stored as images (PDF pages, screenshots)
- Requires multimodal embedding models (ColPali)
- BM25 text search not available
- GPU recommended for embedding generation