Skip to content

Multimodal Datasets

Visual document benchmarks for image-based retrieval.

Available Datasets

Dataset Description
ViDoRe Visual Document Retrieval
ViDoRe v2 Visual Document Retrieval v2
ViDoRe v3 Visual Document Retrieval v3
VisRAG Visual RAG benchmark
Open-RAGBench Open RAG benchmark from arXiv PDFs

Key Characteristics

  • Documents are stored as images (PDF pages, screenshots)
  • Requires multimodal embedding models (ColPali)
  • BM25 text search not available
  • GPU recommended for embedding generation