KoViDoRe v2¶
Korean Visual Document Retrieval benchmark version 2.
Overview¶
| Field | Value |
|---|---|
| Modality | Multimodal (Images + Markdown text) |
| Generation GT | No |
| HF Repository | whybe-choi/kovidore-v2-*-beir |
| Primary Key Type | bigint |
Description¶
KoViDoRe v2 is a Korean visual document retrieval benchmark from the whybe-choi/kovidore-benchmark-beir-v2 collection with BEIR-style corpus, queries, and qrels subsets. Corpus rows contain page images plus markdown and layout metadata; query rows contain Korean retrieval questions.
Supported domains and source datasets:
cybersecurity:whybe-choi/kovidore-v2-cybersecurity-beireconomic:whybe-choi/kovidore-v2-economic-beirenergy:whybe-choi/kovidore-v2-energy-beirhr:whybe-choi/kovidore-v2-hr-beir
Download¶
autorag-research data restore kovidorev2 <dataset_name>_<embedding_model>
Ingest from Source¶
autorag-research ingest --name=kovidorev2 --extra dataset-name=hr --embedding-model=colpali
By default, qrels are mapped to image chunks. To evaluate text chunks or mixed image/text retrieval, set qrels-mode:
autorag-research ingest --name=kovidorev2 --extra dataset-name=hr --extra qrels-mode=mixed --embedding-model=colpali
Best For¶
- Korean visual document retrieval
- Multi-page visual reasoning
- Comparing image-only, text-only, and mixed page retrieval