Skip to content

SDS KoPub VDR

Korean public-document Visual Document Retrieval benchmark.

Overview

Field Value
Modality Multimodal (Images + OCR/Text)
Generation GT No
HF Repository mteb/SDSKoPubVDRT2ITRetrieval
Primary Key Type string
License CC BY-SA 4.0

Description

SDS KoPub VDR is a Korean visual document retrieval benchmark built from real public/government documents. This ingestor uses the MTEB text-to-image retrieval version, which exposes a clean BEIR-style layout with corpus, queries, and qrels configs.

The dataset has three Hugging Face configs:

  • queries: text retrieval queries
  • corpus: page images plus extracted text
  • qrels: query-to-page relevance judgments

Ingest from Source

autorag-research ingest --name=sds_kopub_vdr --embedding-model=colpali

By default, qrels are mapped to image chunks. To evaluate text chunks or mixed image/text retrieval, set qrels-mode:

autorag-research ingest --name=sds_kopub_vdr --extra qrels-mode=mixed --embedding-model=colpali

Best For

  • Korean public-document visual retrieval
  • Layout-, table-, chart-, and image-heavy document retrieval
  • Comparing image-only, text-only, and mixed page retrieval